• No results found

Sampling and predicting geographic areas using participatory sensing

N/A
N/A
Protected

Academic year: 2021

Share "Sampling and predicting geographic areas using participatory sensing"

Copied!
58
0
0

Loading.... (view fulltext now)

Full text

(1)

IT 15083

Examensarbete 30 hp

December 2015

Sampling and predicting geographic

areas using participatory sensing

Wei Wang

Institutionen för informationsteknologi

(2)
(3)

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Sampling and predicting geographic areas using

participatory sensing

Wei Wang

Participatory sensing is the concept that people contribute information they retrieved independently from the environment using sensors to build a whole body of

knowledge. With the popularity of mobile devices, such as smart phones, which have multiple sensors and wireless interfaces, “participatory sensing” has become feasible in a large-scale. Spatial sampling is a technique using a limited number of geographical samples to achieve high credibility in measurement, and then predicting data values for unsampled areas. In this paper, participatory sensing is combined with spatial sampling and prediction, and evaluated under various scenarios.

In this paper, an approach based on participatory sensing, sampling and predicting spatial data and evaluating participatory sensing involving prediction results is designed. A Java system prototype is implemented based on the design. Perlin noise and the ONE simulator are used to implement simulation for spatial sampling with participatory sensing. In the prediction, three different prediction algorithms are applied, Voronoi diagram, Delaunay triangulation with gradient and ordinary Kriging. Evaluation of participatory sensing and spatial sampling is measured by

root-mean-square-error between true map and predicted map by pixels. The results of the experiments indicate that generally the Voronoi diagram has larger error value than Delaunay triangulation with gradient when only having a few samples. And ordinary Kriging produces the most accurate results but it has highest time complexity and requires a large number of samples to achieve high accuracy. In addition, more evenly distributed samples contribute to higher accuracy of prediction. Given a proper guide, participants in participatory sensing can improve the spatial sampling quality a lot.

Tryckt av: Reprocentralen ITC IT 15083

Examinator: Edith Ngai

(4)
(5)

1

Content  

Chapter  1.  Introduction  ...  1  

1.1  Background  and  motivation  ...  1  

1.2  Problem  description  ...  2  

1.3  Thesis  structure  ...  3  

Chapter  2.  Relevant  research  ...  5  

2.1  Spatial  sampling  ...  5  

2.2  Participatory  Sensing  ...  6  

2.3  Participatory  sensing  in  the  context  of  disasters  ...  6  

Chapter  3.  Experiment  design...  8  

3.1  Experiment  scenario  ...  8  

3.2  Experiment  design  ...  9  

3.3  Technology  roadmap  ...  9  

Chapter  4.  Techniques  of  sampling  spatial  data  ...  11  

4.1  Perlin  noise  ...  11  

4.1.1  Why  is  Perlin  noise?  ...  11  

4.1.2  Noise  ...  12  

4.1.3  What  is  Perlin  noise?  ...  12  

4.2  The  ONE  simulator  ...  14  

4.2.1  What  is  the  ONE  simulator?  ...  15  

4.2.2  Why  is  the  ONE  simulator?  ...  15  

4.2.3  The  ONE  simulator  in  system  prototype...  16  

Chapter  5.  Prediction  Methods  ...  17  

5.1  Voronoi  diagram  ...  17  

5.1.1  What  is  Voronoi  diagram  ...  17  

5.1.2  Algorithm  of  Voronoi  diagram  ...  18  

5.2  Delaunay  triangulation  with  gradient  ...  19  

5.2.1  Delaunay  triangulation  ...  20  

5.2.2  Barycentric  coordinates  ...  20  

5.2.3  Implementation  of  barycentric  coordinates  ...  21  

5.2.4  Delaunay  triangulation  with  gradient  ...  22  

5.3  Kriging  ...  22  

5.3.1  Regionalized  variables  ...  23  

5.3.2  Variogram  function  ...  23  

5.3.3  Kriging  ...  23  

Chapter  6.  Evaluation  of  prediction  methods...  28  

6.1  Grayscale  image  ...  28  

6.2  RMSE  ...  29  

Chapter  7.  System  prototype  ...  30  

(6)

2

7.1.1  System  prototype  development  environment  ...  30  

7.1.2  System  prototype  structure  ...  31  

7.1.3  System  prototype  process  design  ...  33  

7.2  Input  ...  35  

7.2.1  Default  settings  of  simulation  ...  35  

7.2.2  Simulation  time  ...  36  

7.2.3  Update  interval  ...  36  

7.2.4  Number  of  hosts  ...  36  

7.2.5  Moving  speed  ...  36  

7.2.6  Perlin  noise  scale  ...  37  

7.3  Output  ...  37  

Chapter  8.  Data  collection  and  analysis  ...  38  

8.1  Results  analysis  -­‐  simulation  time  ...  38  

8.2  Results  analysis  -­‐  update  interval  ...  40  

8.3  Results  analysis  -­‐  number  of  host  ...  42  

8.4  Results  analysis  -­‐  Moving  speed  ...  44  

8.5  Results  analysis  -­‐  Scale  of  Perlin  noise  ...  45  

Chapter  9.  Conclusion  and  future  work  ...  47  

9.1  Conclusion  ...  47  

9.2  Future  work  ...  48  

Reference  ...  50  

 

(7)

1

Chapter  1.  Introduction  

1.1  Background  and  motivation  

With  the  popularity  of  the  mobile  devices,  especially  smart  phones,  more  and  more   functions  have  been  developed  based  on  this  handheld  equipment.  With  the  help  of   the  mobile  internet,  smart  phones  could  achieve  capturing,  transmitting  and  storing   texts,  images,  locations  and  other  kinds  of  data,  interactively  and  autonomously  [1].   Usually,  the  wireless  devices  have  relative  low  battery  and  memory  capacity  but  they   can  interface  with  infrastructure  easily.  Integrating  sensors  into  smart  phones,  could   act   as   sensor   node   in   wireless   network.   Participatory   sensing   is   the   concept   that   people   contribute   information  they   retrieved   independently   from  the  environment   using   sensors   to   build   a   whole   body   of   knowledge   [2].   Thanks   to   the   widespread   popularization   of   mobile   devices,   constructing   a   large-­‐scale   participatory   sensing   network  becomes  possible  [3].  

 

Natural  disaster  is  always  one  of  the  heat-­‐debated  subjects  for  scientists  all  over  the   world.  Not  only  because  of  the  difficulties  to  precisely  predict  it͛s  coming,  but  also  the   consequent   catastrophic   destruction   brings   huge   economic   losses,   countless   homelessness,   even   millions   of   deaths.   According   to   ͞ŶŶƵĂů   Disaster   Statistical  

Review  ϮϬϭϭ͟,  in  2011,  natural  disaster  killed  30,773  people  and  caused  244.7  million  

victims  worldwide.  The  estimated  economic  losses  from  natural  disasters  were  US$   366.1  billion  [4].  The  record  only  including  registered  disasters  is  so  appalling,  not  to   speak  of  counting  the  regular  mini-­‐disasters.  How  to  get  latest  and  useful  information   from  disaster  area  in  order  to  rescue  lives  in  prime  time  is  one  of  the  most  meaningful   topics.   Besides,   monitoring   the   statuses   of   disaster   areas   is   also   important   to   the   disaster  rehabilitation  work.  

 

Participatory  sensing  leverages  the  wider  public  to  each  collect  small  separate  piece   of  data.  The  purpose  is  to  integrate  separate  datasets  to  build  overall  information  for   an   area.   Usually,   the   whole   body   of   knowledge   means   much   greater   accuracy   and   scope  than  any  single  piece  of  it  [2].  For  example,  an  earthquake  could  destroy  roads,   bridges   and   buildings   in   an   area.   However,   planning   a   safe   escape   or   rescue   route   needs   the   latest   traffic   map   for   the   whole   disaster   area.   If   people   could   use   smart   phones  to  update  traffic  status  at  their  spots  and  create  a  latest  traffic  map,  it  could   save   more   people͛s   lives.   No   doubt,   using   participatory   sensing   network   to   help   people  in  disaster  area  is  a  meaningful  and  worthwhile  job.    

 

(8)

2 contamination.  The  local  residents  who  have  the  radiation  sensors  embedded  in  their   smart  phones  are  acting  as  moving  nodes  in  participatory  sensing  network.  The  smart   phones  embedded  with  radiation  sensors  collect  the  data  periodically  and  stores  in   the   smart   phones.   The   data   can   be   uploaded   to   base   station   one   time   per   day   or   transfer   in   real-­‐time   via   wireless   network.   Applying   proper   prediction   method   to   collected  data  can  generate  the  predicted  radioactive  contamination  map.  

 

The   research  leverages  the   public  to   collect  radioactive   contamination  data,   as  the   local  residents  would  be  willing  to  participate  in  research  for  their  own  security  sake.   It   helps   scientists   gather   large   number   of   first-­‐hand   data   with   less   time   and   cost.   Furthermore,  the  predicted  radioactive  contamination  maps  contribute  to  predicting   the  moving  trend  of  radioactive  contamination,  and  help  to  rehabilitation  work.  

1.2  Problem  description  

This  paper  focuses  on  using  a  few  samples  to  predict  spatial  data  for  the  whole  area   based  on  participatory  sensing.  An  approach  is  designed  and  implemented  based  on   it,  which  could  also  benefit  researches  in  geographic  area.  The  method  for  generating   different   samples   distribution   is   to   applying   different   values   to   parameters   in   participatory  sensing  to  simulate  different  situations  for  data  sampling.  The  analyzed   results  can  be  used  to  guide  participants  collecting  more  valuable  pieces  of  data  and   contribute  time  efficiency  and  economy  to  participatory  sensing.  

 

The  thesis  research  resolves  the  following  questions:  

1. Within  a  certain  area,  what  factors  of  sampling  process  will  affect  the  sample   distribution?    

2. To  what  extend  the  factor  affect  the  sample  distribution?  

3. With  different  numbers  and  distributions  of  samples,  which  prediction  method   is  more  suitable?  

4. How  to  guide  participants  collecting  more  valuable  data?    

(9)

3

Figure  1.1  Procedures  for  predicting  a  radiation  map  

In   this   paper,   different   values   for   participatory   sensing   parameters   are   applied   to   generate   different   distributions   of   samples.   With   the   same   set   of   samples,   three   prediction  methods  are  applied  to  produce  predicted  maps:  they  are  Voronoi  diagram,  

Delaunay  triangulation  with  gradient  and  ordinary  Kriging.  According  to  the  prediction  

result,  conclusions  are  made  from  different  aspects.  

1.3  Thesis  structure  

Chapter   2   describes   the   relevant   research   related   to   spatial   data   and   participatory   sensing.  In  this  paper,  spatial  data  is  the  research  content,  and  participatory  sensing   is   the   research   background   and   approach.   Section   2.1   introduces   the   concept   of   spatial  data  and  spatial  sampling.  Section  2.2  describes  what  is  participatory  sensing,   the  differences  between  wireless  sensing  network  and  participatory  sensing  network,   and  the  premise  to  make  participatory  sensing  feasible  in  large  scale.  

Chapter  3  introduces  the  experiment  scenario  and  experiment  design.  Measuring  the   radioactive  contamination  level  in  Fukushima  and  places  nearby  is  the  problem  need   to   solve.   In   this   chapter,   an   approach   based   on   simulation   of   participatory   sensing   sampling   and   predicting   radioactive   contamination   level   in   Fukushima   and   places   nearby  is  design  and  related  technical  road  map  is  proposed.  

Chapter   4   discusses   the   techniques   applied   in   sampling   spatial   data.   The   following   problems  will  be  discussed  in  chapter  4:  (a)  What  is  Perlin  noise?  (b)  Why  use  Perlin   noise  instead  of  using  true  data?  (c)  What  is  the  ONE  simulator?  (d)  Why  use  the  ONE   simulator  to  simulate  the  sampling  process  instead  of  real  sampling?  

In  Chapter  5,  three  prediction  methods  mentioned  in  Section  1.1  are  introduced.     In   order  to   evaluate  the   performances   of   different   prediction   methods,   the   system   prototype  uses  grayscale  image  and  Root  Mean  Square  Error  (RMSE)  to  compare  the   difference   between   true   and   predicted   spatial   data   maps.   They   are   introduced   in   Chapter  6.  

(10)

4 Chapter  7  introduces  the  design  and  implementation  of  the  system  prototype  and  the   input  of  the  experiments.  

 

Chapter  8  presents  the  results  of  experiment  according  to  the  inputs  introduced  in   Section  6.2.3.  In  this  chapter,  each  section  shows  results  of  varying  one  variable  and   fixed  other  variables  in  the  experiment,  and  analyzes  three  prediction  algorithms  in   diverse  aspects.  

 

(11)

5

Chapter  2.  Relevant  research  

The  popularization  of  the  mobile  devices  prompts  people  developing  more  and  more   functions   based   on   it.   With   the   help   of   the   mobile   internet,   smart   phones   could   achieve  capturing,  transmitting  and  storing  texts,  images,  locations  and  other  kinds  of   data,  interactively  and  autonomously  [1].  These  features  make  constructing  a  large-­‐ scale  participatory  sensing  network  possible.  

Radioactive   contamination   is   a   kind   of   data,   which   needs   to   be   identified   by   geographic   location   and   stored   as   coordinates   or   topology.   Such   types   of   data   are   called   spatial   data   [6].   Spatial   data   are   used   to   describe   the   properties   of   objects.   Another   related   concept   is   spatial   sampling.   Spatial   sampling   is   the   theoretical   foundation  for  predicting  spatial  data  with  only  a  few  samples  [7].  Research  on  spatial   data  is  always  a  heated  discussion  topic  in  geostatistics  area  [8].  Back  in  1951,  Danie   G.   Krige   had   tried   to   solve   problems   on   mine   valuation   based   on   statistic   method,   which  is  the  base  of  Kriging  [9].  

In   this   paper,   spatial   data   is   the   research   content,   and   participatory   sensing   is   the   research  background  and  approach.  Section  2.1  introduces  the  concept  of  spatial  data   and   spatial   sampling.   Section   2.2   describes   what   are   participatory   sensing,   the   differences  between  wireless  sensing  network  and  participatory  sensing  network,  and   the  premise  to  make  participatory  sensing  feasible  in  large  scale.  

2.1  Spatial  sampling  

 

As   mentioned   above,   all   the   data   and   geographic   information   talked   about   in   this   paper   are   spatial   data.   Spatial   data   is   used   to   describe   the   location,   shape,   size,   distribution  and  other  information  of  space  objects  in  the  real  world  [6].  For  example,   radioactive  contamination,  humidity,  temperature  are  all  spatial  data.  Spatial  data  has   unique  location  in  space  coordinate  system,  and  it  is  used  to  describe  the  properties   of   objects.   Nowadays,   spatial   data   is   widely   used   in   human   daily   life,   such   as   transportation,   urban   planning,   information   communication,   aerospace,   satellite   positioning  and  so  on  [10].  

(12)

6 In  this  paper,  we  start  with  the  story  of  radioactive  contamination  in  Fukushima,  which   is   an   example   of   spatial   data.   On   March   11,   2011,   a   magnitude-­‐9   earthquake   with   tsunami   visited   Fukushima.   It   crippled   the   Fukushima   Daiichi   nuclear   plant,   which   leads  to  massive  radiation  releasing  into  atmosphere  and  ocean.  It  is  known  that  long   time   of   exposure   to   the   radiation   is   harmful   to   living   creatures   and   the   damage   is   irreversible   [13].   In   this   case,   daily   monitoring   of   radiation   level   in   Fukushima   and   places  nearby  is  really  important  to  local  residents.  

2.2  Participatory  Sensing  

With   the   popularity   of   mobile   devices,   such   as   smart   phones,   which   have   multiple   sensors  and  wireless  interfaces,  ͞ƉĂƌƚŝĐŝƉĂƚŽƌLJ  ƐĞŶƐŝŶŐ͟  has  become  feasible  in  a  large-­‐ scale.   Participatory   sensing   is   the   concept   that   people   contribute   information   they   retrieved  independently  from  the  environment  using  sensors  to  build  a  whole  body  of   knowledge.   Usually,   the   wireless   devices   have   relative   low   battery   and   memory   capacity   but   they   can   interface   with   infrastructure   easily.   Smart   phones   can   act   as   spatial  sensor  nodes  in  wireless  network  as  they  are  location-­‐aware  [1].    

Wireless  sensor  network  research  has  investigated  integrating  sensing,  uploading,  and   computation  in  sensors  to  collect  data  motivated  by  military,  industry  and  sciences.   The   difference   between   wireless   sensor   network   (WSN)   and   participatory   sensing   network  (PSN)  is  that  sensors  in  PSN  are  generally  individually  controlled  by  users  [2].   To   be   precise,   the   sensors   in   WSN   are   pre-­‐deployed   and   controllable   devices.   However,  sensors  in  PSN  are  self-­‐controlled  by  participants,  and  they  are  more  like   free  style,  always  on,  moving  sensors.  Usually,  the  base  number  of  sensors  in  PSN  is   much  larger  than  sensors  in  WSN,  and  the  purposes  for  collecting  data  more  target  in   public   sphere.   In   short,   participatory   sensing   leverages   the   public   to   build   an   interactive  and  participatory  network.    

Usually,  sensors  are  deployed  by  organizations  to  members  for  sensing  information   from  the  environment.  Usually  in  order  to  arouse  the  interest  for  public  to  take  part   in,   participants   get   paid   or   benefit   from   sensing   results   [2].   For   example,  the   radioactive  contamination  around  Fukushima  is  life  threatening  and  closely  bound  up   with   local  residents͛   daily   life.   The  results   of   sensing   radioactive   contamination   are   important  to  the  public,  which  enables  participatory  sensing  in  the  large  scale.  

 

In   summary,   participatory   sensing   is   a   concept   involves   civic   engagement,   data   collection,  computational  thinking,  math  and  science.  It  leverages  the  public  to  collect   data  and  contribute  to  the  whole  body  of  knowledge.  

 

2.3  Participatory  sensing  in  the  context  of  disasters  

(13)

7 human   who   experience   or   victim   the   disaster   often   react   quicker   than   any   government  and  organization[14][15].  In  addition,  they  could  provide  information  of   extend  of  the  damage,  the  evolution  about  the  disaster.  With  the  help  of  smart  phones   and   cloud   service,   the   messages   and   images   related   to   the   disaster   are   easy   to   be   spread  to  outside.  This  dynamic  and  real-­‐time  data  is  critical  for  building  information   about  shelter  locations,  family  tracing,  and  missing  people  [14].  Combining  pieces  of   information  to  a  real-­‐time  situation  map  gives  people  more  awareness  of  the  current   situation   and   contributes   to   better   decision-­‐makings.   In   the   context   of   disaster,   a   thorough   combination   of   spatial   sampling   and   participatory   sensing   network   is   a   promising  and  meaningful  solution.  

 

One  research  direction  concentrates  on  spatial  sampling  [15].  For  example,  researches   about  improving  the  accuracy  of  the  spatial  sampling.  Another  direction  concentrates   on  network   [14].   Like   how   to   make   sure   the  data   can   be   transferred   with   integrity   timely?  How  to  transfer  different  type  of  data  in  an  ad-­‐hoc  network?  How  to  avoid   traffic  congestion  in  the  ad-­‐hoc  network?  The  others  focus  on  disaster  management   [14][15].   The   topics   involve   civic   engagement   and   collaboration   between   different   roles  in  disaster.  Such  as  how  to  use  social  media,  like  applications  for  smartphones   to   build   the   bridge   between   onsite   human   sensors   to   related   organizations͛   and   governments͛ƌĞƐƉonders?  How  to  effectively  collect  useful  information  and  build  a   whole   map   for   rescuing   during   disaster   and   rehabilitation   after   disaster?   How   to   rapidly  deliver  useful  message  to  corresponding  decision  makers  and  responders?   How  to  use  current  information  to  generate  more  reliable  decision?  

 

This  thesis  is  focusing  on  the  integration  of  spatial  sampling  and  participatory  sensing   to  generate  conclusions  about  what  types  of  aspects  affect  the  prediction  results,  and   how   to   guide   people   in   participatory   sensing   to   generate   more   meaningful   information.    In  addition,  system  architecture  is  designed  and  implemented  as  a  proof   of  contents.    

(14)

8

Chapter  3.  Experiment  design  

3.1  Experiment  scenario  

A   magnitude-­‐9   earthquake   with   tsunami   visited   Fukushima   on   March   11,   2011.   It   crippled   the   Fukushima   Daiichi   nuclear   plant,   which   leads   to   massive   radiation   releasing  into  atmosphere  and  ocean.  Monitoring  the  radiation  level  in  Fukushima  and   places  nearby  became  a  regular  task  for  Japanese  government  and  the  public  [16].  It   is  known  that  long  time  of  exposure  to  the  radiation  is  harmful  to  living  creatures  and   the  damage  is  irreversible.  In  this  case,  scientists  tried  to  predict  the  radiation  level   for  the  whole  area  by  only  collecting  a  few  samples.    

 

This   method   is   obviously   easier   compared   with   sampling   everywhere   in   the   area,   considering   safety,   efficiency,   and   economy.   Besides,   sampling   everywhere   is   impossible   under   some   circumstances.   For   example,   the   area   is   extremely   large   or   some   spots   are   out   of   reach.   In   conclusion,   using   a   few   samples   to   predict   corresponding  spatial  data  for  a  whole  area  is  preferable.  

 

Figure  3.1  shows  procedures  to  get  a  predicted  radiation  level  map  for  places  near   Fukushima   by   using   a   few   samples.   Firstly,   collect   samples   from   the   area   being   contaminated.  The  grayscale  image  (A)  is  a  simulation  of  radiation  map  for  the  area   being   contaminated.   The   different   level   of   gray   colors,   varying   from   black   at   the   weakest   intensity   to   white   at   the   strongest,   represents   the   different   levels   of   radioactive  contaminations  [17].  The  red  spots  on  image  (B)  are  samples  (radioactive   contamination  data)  collected  on  those  places.  Secondly,  predicting  the  radioactive   contamination   level   for   other   places   in   this   area   only   according   to   the   samples   gathered  in  step  one.  The  image  (C)  is  a  predicted  radioactive  contamination  map.  It   is  generated  by  a  prediction  method  and  samples  gathered  in  image  (B).    

 

(15)

9

3.2  Experiment  design  

In   this   experiment,   our   approach   is   designed   to   predict   spatial   data   based   on   participatory   sensing   and   calculate   the   RMSE   (root-­‐mean-­‐square   error)   to   evaluate   the  performance  of  different  algorithms  [18].  This  approach  is  applicable  to  sampling   and   predicting   different   types   of   scalar   spatial   data.   For   example,   temperature,   humidity,   traffic   condition   and   environment   pollution.   In   order   to   evaluate   the   algorithm  precisely,  Perlin  noise  technique  is  used  to  generate  a  ͞ŐƌŽƵŶĚ  ƚƌƵƚŚ͟  map.   Perlin  noise  is  a  procedural  texture  generation  technique  [19].  The  ONE  simulator  is   an   Opportunistic   Network   Environment   (ONE)   simulator.   It   is   used   to   simulate   a   sampling  procedure  [20].    

 

In   this   paper,   the   experiment   area   is   500   square   kilometers.   It   is   the   size   of   a   medium/large   city   approximately.   Perlin   noise   generations   with   500*500   pixels   are   generated  to  simulate  radioactive  contamination  levels  in  this  area.  Random  waypoint   is  chosen  as  mobility  model  of  participants,  since  it  is  one  of  the  most  fundamental   and   widely   used   movement   models.   In   the   experiment,   the   settings   of   number   of   participants,   simulation   time,   update   interval,   and   participants͛   moving   speed   are   varied.    

 

Figure  3.2  shows  the  true  map  of  radiation  level  (A),  the  true  map  with  samples  (B)   and  predicted  map  (C).  The  experiment  is  divided  into  three  procedures,  sampling  (1),   prediction   (2),   and   algorithm   evaluation   (3).   The   procedure   (3)   is   the   generated  by   comparing  true  map  (A)  and  prediction  map  (C).  

Figure  3.2  Experiment  design:sampling,  prediction  and  evaluation  

3.3  Technology  roadmap  

(16)

10 prediction  and  evaluation.  The  techniques  mentioned  in  Figure  3.3  are  introduced  in   Chapter  3,  4,  and  5.  

Figure  3.3  Technology  roadmap    

In   order   to   achieve   all   procedures   as   shown   in   Figure   3.2,   a   prototype   system   is   implemented  based  on  Java.  The  prototype  system  is  planned  to  finish  the  following   tasks:  

1. Produce  Perlin  noise  generations.  

2. Integrate   with   the   ONE   simulator,   sampling   data   according   to   input   parameters.  

3. Implement  three  different  prediction  algorithms,  Voronoi  diagram,  Delaunay   triangulation  with  gradient,  and  Ordinary  Kriging.  

4. Compare   the   ground   truth   with   the   predicted   map   using   the   RMSE   to   understand  how  well  the  prediction  works.  

5. Achieve  batch  mode  to  produce  result  according  to  different  inputs.  

(17)

11

Chapter  4.  Techniques  of  sampling  spatial  data  

In   this   paper,   the   approach   is   based   on   participatory   network,   using   Perlin   noise   generation  to  simulate  spatial  data,  sampling  and  predicting  spatial  data.  Prediction   techniques  are  designed,  implemented  and  evaluated.  The  approach  shows  in  Figure   3.2.  The  grayscale  image  (A)  is  a  true  map  of  one  kind  of  spatial  data  for  a  square  area,   and  the  grayscale  value  of  each  pixel  represents  the  spatial  data  value  at  that  point.   The  red  points  in  image  (B)  are  samples.  The  image  (C)  is  the  prediction  only  based  on   gathered  samples  in  image  (B).  The  evaluation  is  to  compare  the  true  map  (A)  with   predicted   map   (C)   on   each   pixel.   The   paper   will   follow   Figure   3.2   to   introduce   the   related  techniques  and  implemented  experiment.  

 

According  to  Figure  3.2,  the  project  is  divided  into  three  stages,  which  are  sampling,   prediction   and   evaluation.   Chapter   4   discusses   the   techniques   applied   in   sampling   spatial  data,  the  process  is  marked  as  (1)  in  Figure  3.2.  The  following  problems  will  be   discussed  in  chapter  4:  (a)  What  is  Perlin  noise?  (b)  Why  use  Perlin  noise  instead  of   using  true  data?  (c)  What  is  the  ONE  simulator?  (d)  Why  use  the  ONE  simulator  to   simulate  the  sampling  process  instead  of  real  sampling?    

4.1  Perlin  noise  

Perlin  noise  is  a  procedural  texture  generation  technique  [19].  It  is  a  type  of  noise  that   appears   smooth   and   looks   natural,   which   is   widely   used   in   computer   graphics.   Especially  in  real  time  computer  games,  Perlin  noise  is  used  to  trade  time  for  space   [21].  In  this  paper,  Perlin  noise  is  used  to  simulate  the  spatial  data.  The  reasons  for   using  Perlin  noise  instead  of  real  spatial  data  are  stated  in  Section  4.1.1.  Section  4.1.3   introduces  what  is  Perlin  noise,  and  Section  4.1.4  describes  the  steps  to  produce  Perlin   noise.    

4.1.1  Why  is  Perlin  noise?  

(18)

12 unavailable,  since  the  huge  investment  of  time  and  resources.  In  some  special  cases,   sampling   some   area   is   impracticable.   On   the   contrary,   Perlin   noise   is   produced   by   computer  program.  It  is  quick,  cheap  and  customized  easily  by  varying  variables  [23].   Besides,   Perlin   noise   generations   are   repeatable.   This   ͞pseudo-­‐random͟   feature   enables  Perlin  noise  to  be  a  perfect  subject  in  the  perspective  of  assessing  prediction   methods  [24].  Finally,  Perlin  noise  could  generate  thousands  of  generations  under  the   same  condition  by  varying  value  of  random  seed.  To  be  precise,  the  experiments  will   run  multiple  Perlin  noise  generations  with  different  appearances  but  the  same  setting   of   variables.   The   average   value   of   repeated   experiments   contributes   a   confident   result.  

4.1.2  Noise  

Noise   is   a   primitive   texture.   It   can   be   used   to   create   a   wide   variety   of   natural   appearance   texture.   Combining   noises   into   different   mathematical   expressions   generates   procedural   texture,   which   is   called   noise   function   in   mathematics   [23].   Noise  function  has  three  features,  which  are  shared  with  Perlin  noise.  

Pseudo-­‐random  

Images  produced  by  the  noise  function  appear  to  be  random,  but  they  are  not  truly   random.  Random  means  the  noise  generation  looks  random  and  irregular.  However,   given  the  same  input,  noise  function  will  produce  the  same  output  [23].  This  feature   makes  experiments  based  on  Perlin  noise  repeatable.    

 ࡾ࢔  to  R  

Noise   is   a   mapping   from  ܴ௡  to   R,   and   ͚n͛   represents   the   dimension   of   the   space.   Inputting   an   n-­‐dimensional   real   coordinates   in   the   space,   then   noise   function   will   return  a  real  value.  The  most  commonly  used  are  n=1,  n=2  and  n=3.  Given  a  space,   any   coordinate   in   the   space   has   a   corresponding   value.   This  feature   is   shared   with   spatial  data,  and  it  is  also  the  theoretical  foundation  for  using  Perlin  noise  to  simulate   spatial  data.    

Band-­‐limited  

If  noise  is  looked  as  a  signal,  generally  almost  all  of  its  energy  is  focused  on  a  small   part   of   the   frequency   spectrum.   In   another   word,   the   high   frequencies   and   low   frequencies  contribute  very  little  energy  [23].  This  is  the  feature  shared  with  all  the   natural   things,   as   for   most   of   things   in   nature   are   in   normal   distribution.   One   step   further,  this  feature  is  the  theoretical  basis  for  Perlin  noise͛s  natural  appearance.    

4.1.3  What  is  Perlin  noise?  

(19)

13 Perlin   noise   is   a   procedural   texture   technique.   It   is   developed   by   Ken   Perlin,   who   received  a  Technical  Achievement  Award  from  the  Academy  of  Motion  Picture  Arts   and  Sciences  in  1997  [24].  Perlin  noise  is  a  type  of  gradient  noise.  It  is  used  to  increase   the   appearance   of   realism   in   computer   graphics.   The   function   generates  a   pseudo-­‐ random  appearance,  which  mentioned  in  Section  3.1.1.  Given  the  same  input,  Perlin   noise  function  will  produce  the  same  output.  This  property  makes  it  easy  to  control,   and  the  results  are  repeatable.  Multiple  scaled  copies  of  Perlin  noise  can  be  put  into   mathematical  expressions  to  generate  a  variety  of  procedural  textures  [24].  Images   generated  based  on  Perlin  noise  are  in  high  quality  of  simulation.  Most  textures  on   natural  objects  can  be  produced  based  on  Perlin  noise,  such  as  smoke,  cloth,  fire  and   marble.  Figure  4.1  is  the  computer  graphics  generation  based  on  Perlin  noise.  

Figure  4.1  Computer  graphics  generation  based  on  Perlin  noise    

(20)

14

Figure  4.2  Perlin  noise  generations  in  scale  1  (A)  and  scale  4  (B)  

With  the  same  scale,  generating  different  appearances  of  Perlin  noise  is  controlled  by   random  seed.  In  the  Project,  the  seeds  =  1,  10,100,1000,10000  are  chosen  to  reduce   the  impact  from  a  specific  Perlin  noise  generation  and  contribute  a  relative  reliable   results.  Perlin  noise  generation  in  scale  4  with  seed  =  1,  10  are  shown  in  Figure  4.3.  

Figure  4.3  Perlin  noise  generations  in  scale  4  with  different  random  seed  values    

4.2  The  ONE  simulator  

In  this  project,  the  ONE  (Opportunistic  Network  Environment)  simulator  was  chosen   to  simulate  the  participatory  sensing  process.  Nodes  in  the  ONE  simulator  are  worked   as   participants   (data   collectors).   The   ONE   simulator   provides   multiple   movement   models  for  sensor  nodes.  Besides,  some  parameters  in  the  project  (number  of  sensor   nodes,  moving  speed  and  total  simulation  time)  are  set  in  the  ONE  simulation.      

(21)

15 noise,  the  ONE  simulator  could  simulate  participants  moving  as  preset  and  periodically   read  and  upload  spatial  data  at  their  current  positions.  

4.2.1  What  is  the  ONE  simulator?  

The   ONE   simulator   is   an   Opportunistic   Network   Environment   (ONE)   simulator.   It   is   developed   by   scholars   from   Helsinki   University   in   Finland   [25].   Opportunistic   networking   is   a   subclass   of   Delay-­‐Tolerant   Network   where   network   contacts   are   intermittent  or  link  performance  is  highly  variable.  In  this  case,  there  is  no  end-­‐to-­‐end   path  between  source  and  destination  for  most  of  the  time.  Besides,  the  path  can  be   highly  unstable  and  may  change  or  break  frequently.  In  order  to  make  communication   possible   in   opportunistic   networks,   the   intermediate   nodes   are   using   different   protocols  to  realize  message  ferrying  [26].  

The  ONE  simulator  can  generate  nodes  movement  according  to  different  movement   models.   It   also   provides   routing   message   between   nodes   with   different   routing   algorithms   and   various   sender-­‐receiver   types.   In   addition,   users   can   visualize   the   nodes  movement  and  message  delivery  in  real  time  with  the  graphic  user  interface  of   the  ONE.  What  is  more,  it  provides  multiple  ways  to  visualize  data  and  results.  Figure   4.4  shows  the  components  of  the  ONE  simulator  [20].  

Figure  4.4  components  of  the  ONE  simulator  [20]  

4.2.2  Why  is  the  ONE  simulator?  

(22)

16 ONE  simulator  provides  batch  mode  to  run  simulations,  which  is  suitable  for  a  large   number   of   experiments.   Fourthly,   the   ONE   simulator   is   developed   in   Java,   and   the   system  prototype  is  also  based  on  Java.  Technically  speaking,  the  ONE  simulator  could   be  integrated  into  a  system  prototype  seamlessly.  

4.2.3  The  ONE  simulator  in  system  prototype  

Perlin   noise   generations   are   used   to   simulate   spatial   data   maps,   and   the   ONE   simulator   is   used   to   simulate   sensing   and   sampling   spatial   data.   Combining   them   together   can   help   model   participatory   sensing.   Random   waypoint   is   chosen   as   the   nodes͛   movement   model   in   simulations.   There   are   two   main   reasons   for   choosing   random   waypoint.   Firstly,   it   is   the   most   common   used   movement   model   for   all-­‐ purpose  [27].  Secondly,  the  movement  of  people  is  relatively  random  compared  to   pre-­‐deployed   sensors.   We   can   set   simulation   conditions   by   varying   values   of   parameters  in  the  ONE  simulator.  Parameters  need  to  be  set  are  shown  as  following:   ͻ  (X,Y):The  size  of  Perlin  noise  generation,  as  well  as  the  size  of  movement  area  

for  sensors.  

‡ Simulation  Time(s):The  total  time  for  simulation.  

‡ Update  Interval(s):The  time  interval  for  participants  (nodes)  to  read  and  upload  

data.  

‡ Number  of  Hosts:The  number  of  participants  (nodes).  

‡ Moving  Speed(m/s):The  moving  speed  of  participants  (nodes).  

‡ Mobility   Model:The   movement   model   for   participants   (nodes),   it   is   set   as  

random  waypoint.  

‡ Mobility  Random  Seed:Values  of  movement  random  seed.  

 

(23)

17

Chapter  5.  Prediction  Methods  

In   this   chapter,   three   prediction   methods   for   spatial   data   are   introduced.   They   are   Voronoi  diagram,  Delaunay  triangulation  with  gradient  and  Kriging.  Voronoi  diagram   is   a   classic   nearest   neighbor   searching   algorithm,   which   is   widely   used   in   Computational   Geometry   [28].   Voronoi   diagram   is   the   first   introduced   algorithm   because  of  its  simple  idea  and  facile  realization.  It  is  easy  to  come  up  with  the  idea  of   using  the  value  of  samples  to  predict  areas  nearby.  Delaunay  triangulation  is  a  dual  to   Voronoi   diagram.   Delaunay   triangulation   with   gradient   is   the   idea   combined   of   Delaunay  triangulation  [29]  and  barycentric  coordinates  [30].  Kriging  is  a  geostatistical   estimator,  an  optimal  interpolation  based  on  observed  values  and  weights  according   to   spatial   covariance   values   [31].   The   complexity   of   implementing   three   prediction   methods  grows  in  challenge  and  complexity.    

5.1  Voronoi  diagram  

5.1.1  What  is  Voronoi  diagram  

Voronoi   is   a   fundamental   geometric   data   structure.   It   is   also   called   Dirichlet   tessellation.   It   was   first   proposed   by   Dirichlet   in   1850   [32].   Then   a   Russian   mathematician,  Georgy  Fedoseevich  Voronoi,  gave  a  further  explanation  in  1907  [33].   Voronoi  diagram  is  a  way  to  divide  a  space  into  a  number  of  regions.  Given  a  set  of   specified  points   in   a   space   P   {p1,  ƉϮ͕͙͕   pn},   which   are   called   seeds.   For  each   seed,   there  are  points  within  a  corresponding  region  and  all  of  them  are  closer  to  this  seed   than  to  any  other.  The  divided  regions  are  ͞Voronoi  cells͕͟  and  also  called  Thiessen   polygon.  Thiessen  polygon  has  the  following  properties:  

1. Every  Thiessen  polygon  contains  and  only  contains  one  seed.  

2. All  points  in  the  Thiessen  polygon  have  and  only  have  one  closest  seed,  which  is   the  seed  in  the  Thiessen  polygon.  

3. The  points  on  edges  of  Thiessen  polygon  are  in  equal  Euclidean  distance  to  two   adjacent  seeds.  In  other  word,  edges  of  Thiessen  polygon  are  the  perpendicular   bisectors  of  two  adjacent  ƐĞĞĚƐ͛  connecting  lines.    

(24)

18

Figure  5.1  Voronoi  diagram  

Voronoi  diagram  is  a  nearest  neighbor-­‐searching  algorithm  [28].  It  is  chosen  to  be  the   first  prediction  method  in  my  project,  because  of  its  simple  idea  and  straightforward   realization.  It  is  widely  used  in  data  analysis,  data  mining  and  data  prediction.  Voronoi   diagram  is  widely  used  in  geometry,  crystallography,  geography  and  meteorology.    

5.1.2  Algorithm  of  Voronoi  diagram  

The  idea  of  Voronoi  diagram  is  easy  to  understand.  Firstly,  divide  the  area  into  pieces,   and  make  each  piece  contains  only  one  sample.  Then,  the  predicted  values  of  each   piece  are  the  same  as  the  value  as  the  sample.    

The  implementation  of  Voronoi  diagram  is  the  easiest  of  three  prediction  methods.   The   key   is   to   find   the   Thiessen   polygon   with   given   samples.   The   procedure   for   producing  Voronoi  diagram  can  be  decomposed  into  the  following  steps  [34].    

1. Input  the  samples͛  coordinates  and  values.  

2. Traverse  all  the  points  on  the  plane.  For  each  point,  do  the  following  operations:   1) Calculate  the  Euclidean  distances  with  each  sample.  

2) Count  the  number  of  nearest  samples.    

3) Classify  the  points  according  to  number  of  nearest  samples.  Points  with  only   one  nearest  seed  are  points  in  Thiessen  polygon.  Points  with  more  than  one   nearest  seeds  are  on  the  edges  of  Thiessen  polygon.    

4) Assign  points  in  Thiessen  polygon  with  the  value  of  the  sample  in  the  same   Thiessen   polygon.   Assign   values   of   points   on   the   edge   of   Thiessen   polygon   closest  ƐĂŵƉůĞƐ͛  mean.  

5) Generate  grayscale  image  according  to  the  assigned  values.  

(25)

19

Figure  5.2  true  map  of  Perlin  noise  in  scale  1  (A)  and  predicted  map  with  Voronoi  diagram  by  using  2000  samples   (B)  

Figure  5.3  true  map  of  Perlin  noise  in  scale  4  (A)  and  predicted  map  with  Voronoi  diagram  by  using  2000  samples   (B)  

5.2  Delaunay  triangulation  with  gradient  

(26)

20

5.2.1  Delaunay  triangulation  

The  idea  of  Delaunay  triangulation  is  to  divide  an  area  into  a  triangle  meshes.  In  this   paper,  all  the  vertexes  of  triangles  are  samples.  The  rules  of  Delaunay  triangulation   constraints  the  triangle  meshes  to  be  unique  and  the  most  regular.  

   

In  order  to  illustrate  the  Delaunay  triangulation,  we  define  Delaunay  edges  first.  With   given  point  set  P,  the  edge  set  E  is  made  up  by  two  points  from  P.  e  is  an  edge  in  set  

E,  and  the  endpoints  of  e  are  a  and  b.  If  edge  e  satisfied  ͞ŵƉƚLJ  ĐŝƌĐůĞ͟  property,  it  is  

a   Delaunay   edge.   Definition   of   ͞ŵƉƚLJ   ĐŝƌĐůĞ͟   is   that   if   and   only   if   there   is   a   circle   passing  through  a  and  b,  and  the  circle  does  not  contains  any  other  points  from  set  P.   If   a   Delaunay   triangulation   only   contains   Delaunay   edges,   then   the   triangulation   is   Delaunay  triangulation  [36].  Delaunay  triangulation  is  named  after  Boris  Delaunay  for   his  work  on  this  topic  from  1934.  Figure  5.4  shows  a  Delaunay  triangulation.  

Delaunay  triangulation  has  some  good  properties:   1. Uniqueness  

No   matter   where   to   start   building   the   triangle   network,   the   final   triangle   network  is  always  the  same  with  the  same  inputted  set  of  discrete  points.  

2. Most  Regular  triangle  network  

Delaunay  triangulation  maximizes  the  minimum  angle  of  all  the  triangles  in   the  triangulation,  which  contributes  to  build  a  most  regularized  triangle  network.   3. Regionality  

Adding,  deleting  or  moving  a  vertex  from  the  triangle  network  will  only  affect   triangles  nearby.  

  Figure  5.4  Delaunay  triangulation  

 

5.2.2  Barycentric  coordinates  

(27)

21 be  written  as  a  weight  of  three  vertices  ሺߣଵǡ ߣଶǡ ߣଷሻ  with  the  constraint  ߣଵ൅ ߣଶ൅ ߣଷ ൌ

ͳ  [38].   Three   parameters   of   barycentric   coordinates   represent   the   proportional   affection   from   three   vertices.   In   other   word,   vertex   closer  to  the   point   has   greater   effect   than   the   other   two   vertices.   The   corresponding   parameter   of   that   vertex   is   larger  than  the  other.  If  values  of  three  vertices  of  a  triangle  are  given  byݎ  ,  ݎ  and   ݎ.    And  ݎ  is  the  value  of  a  point  P  on  this  triangle.  ݎ  can  be  calculated  by  the  formula   ݎൌ ߣݎ൅ ߣݎ൅ ߣݎ.      

5.2.3  Implementation  of  barycentric  coordinates  

The   weight   of   three   vertices   to   point   P   in   ߂ܣܤܥ  is   the   same   as   the   barycentric   coordinates   of   P.   if   the   vertices   of   the   triangle   are   written   as ሺͳǡͲǡͲሻ ,   ሺͲǡͳǡͲሻ   andሺͲǡͲǡͳሻ,  and  the  barycentric  coordinate  of  P  is  ሺߣଵǡ ߣଶǡ ߣଷሻ  with  the  constraint  ߣଵ൅

ߣ൅ ߣ ൌ ͳ.  Then   , ,   ,  which  is  shown  in  Figure   5.5.  

Figure  5.5  Barycentric  coordinate  

The  area  of  triangle  in  the  formula  can  be  calculated  by  Heron͛s  formula  [39].  Heron͛s   formula  is  defined  as  (5.1).  Edges  of  a  triangle  are  written  as ,  and  the  area  of   the  triangle  is  written  as   .  

      ,        (5.1)  

Given   the   coordinates   of   three   vertices   A,   B,   C   of  ȟ  in   two-­‐dimensional   space,   which  areሺšୟǡ ›ୟሻ,  ሺšୠǡ ›ୠሻ  andሺšୡǡ ›ୡሻ.  Then  the  length  of  three  edges  of  ߂ܣܤܥ  are  

ܣܤ ൌ ඥሺݔെ ݔሻଶ൅ ሺݕ

௕െ ݕ௔ሻଶ ,   ܤܥ ൌ ඥሺݔ௖ െ ݔ௕ሻଶ൅ ሺݕ஼െ ݕ௕ሻଶ ,   ܥܣ ൌ

ඥሺݔെ ݔሻଶ൅ ሺݕ

௔െ ݕ௖ሻଶ.  If  a  point  is  on  the  edge  of  triangle,  ,ĞƌŽŶ͛Ɛ  formula  is  not  

suitable  anymore.  According  to  the  ͞ĂĨĨĞĐƚŝŽŶ  ƚŚĞŽƌLJ͕͟  the  point  is  only  affected  by   two   vertices   on   the   same   edge.   If   point   P   is   on   the   edge   AB   of߂ܣܤܥ ,   then   the  

(28)

22

5.2.4  Delaunay  triangulation  with  gradient  

The   combination   of   Delaunay   triangulation   and   barycentric   coordinates   generates   smoothly  changing  values  within  the  area  of  Delaunay  convex  hull.  The  prediction  of   the  area  outside  the  Delaunay  convex  hull  is  to  use  the  value  of  the  nearest  predicted   point͛s   value.   The   true   map   and   predicted   map   with   Delaunay   triangulation   with   gradient  of  Perlin  noise  in  scale  1  is  shown  in  Figure  5.6.  Figure  5.7  shows  true  map   and  predicted  map  with  Delaunay  triangulation  with  gradient  of  Perlin  noise  in  scale   4.  

Figure  5.6  true  map  of  Perlin  noise  in  scale  1  (A)  and  predicted  map  with  Delaunay  triangulation  with  gradient  by   using  2000  samples  (B)  

Figure  5.7  true  map  of  Perlin  noise  in  scale  4  (A)  and  predicted  map  with  Delaunay  triangulation  with  gradient  by   using  2000  samples  (B)  

5.3  Kriging  

(29)

23 perspective,   Kriging   is   an   unbiased   optimal   estimation,   based   on   correlation   and   variability  of  variables,  targets  on  predicting  regionalized  variables.  From  differential   analysis   perspective,   Kriging   is   a   method   to   generate   linear   optimal,   unbiased   interpolation  estimation  for  spatial  data  [40].  Kriging  is  suitable  for  spatially  correlated   regionalized   variables.   In   order   to   understand   Kriging,   the   concept   of   regionalized   variables  and  variogram  function  will  be  introduced  first,  which  will  be  introduced  in   Section  5.3.1  and  5.3.2  respectively.  Kriging  will  be  introduced  in  Section  5.3.3.  

5.3.1  Regionalized  variables  

Regionalized   variables   refer   to   variables   distributed   spatially.   This   kind   of   variable   itself  reflects  property  of  spatial  distribution.  Such  as  mineral,  meteorology,  ecology,   temperature,  humidity,  concentration  and  so  on,  these  are  all  regionalized  variables.   Without  the  description  of  regional  properties,  these  values  are  meaningless  [41].   Regionalized  variable  has  the  following  features:  

1. The  regionalized  variable  Z(X)  is  a  random  function.  It  is  a  random  value  before   observation.  

2. The   regionalized   variables   have   general   structure.   That   is   to   say,   the   random   variable  Z(X)  and  Z(X  +  h)  at  point  X  and  X  +  h  have  autocorrelation  to  some  extent.   In  addition,  the  autocorrelation  depends  on  h,  the  distance  between  two  points,   and  structural  characteristics  of  regional  variable.  

5.3.2  Variogram  function  

Variogram  function  is  a  specific  basic  tool  for  geostatistical  analysis.  It  is  frequently   used  in  estimation  process.  It  describes  the  spatial  dependency  of  spatial  random  field   [42].  It  is  characterization  of  spatial  correlation.  In  one-­‐dimensional  space,  variogram   function  is  defined  as  following.  As  space  point  x moving  on  X  axis,  Z (x)  and  Z (x + h)   are  regionalized  variables  at  point  x  and  point (x + h).  The  variogram  function  for  Z(x)   on  X  axis  is  marked  as  ࢽሺࢎሻ.  Then  the  expression  of  variogram  function  is  shown  as   (5.2)  

           (5.2)  

5.3.3  Kriging  

Simply   speaking,   Kriging   interpolation   is   a   method   that   weights   values   of   measurement   points   around   a   point   to   be   predicted,   to   generate   the   value   of   the   predicted   point.   Kriging   interpolation   algorithm   is   similar   with   the   inverse   distance   weighting  method.  The  key  for  both  of  them  is  to  calculate  the  weights.  The  commonly   used   formula   for   Kriging   interpolation   is   shown   as   expression   (5.3),   in   which  ࢆሺ࢙represents  the  observed  value  at  position  i,  ࣅ  is  the  unknown  weight  at  position  i,  ࢙   is  the  predicted  position,  and  N  is  the  total  number  of  measured  positions  [42].  

(30)

24            (5.3)  

 

In  Kriging  interpolation,  weights  of  measurement  points  are  not  only  related  to  the   distance   between   measurement   points,   but   also   related   to   the   overall   spatial   distribution  of  measurement  points.  That  is  to  say,  spatial  autocorrelation  need  to  be   quantified.  Therefore,  Kriging  interpolation  usually  has  two  steps:  1.  Create  variagram   function  and  covariance  function  to  estimate  the  spatial  autocorrelation;  2.  According   to  kriging  interpolation  formula,  predict  the  values  of  unknown  points.  

First  of  all,  pair  all  the  measurement  points,  and  calculate  the  distances  between  pairs   of  measurement  points.  Then  determine  the  ͞ůĂŐ͟  according  to  the  shortest  distance   and  the  longest  distance.  For  example,  Figure  5.8  is  the  scatter  plot  graph  determining   10  meters  as  lag,  in  which  Y-­‐axis  represents  the  average  value  of  semivariogram,  and   X-­‐axis  represents  the  lag.  

Figure  5.8  Scatter  plot  of  semivariogram(y-­‐axis:  average  value  of  semivariogram,  x-­‐axis  :  lag)  

The   next   step   is   fitting   a   model   according   to   the   scatter   plot   of   experimental   semivariogram  function.   Theoretically,   it   is   similar   to   the   regression   analysis.   It   is  a  

process  of  constructing  a  curve  that  has  the  best  fit  to  data  points.  Kriging  can  use  many  

models   for   modeling   the   experimental   semivariogram   function:   linear   model,   exponential  model,  Gaussian  model,  spherical  model,  and  circle  model.  In  this  paper,   exponential   model   is   chosen   as   the   fitting   model,   since   it   is   the   most   widely   used   model.  The  exponential  model  is  shown  in  Figure  5.9.  The  fitting  result  is  shown  in   Figure  5.10.  

(31)

25

Figure  5.9  the  exponential  model  

Figure  5.10  Fitting  result  of  exponential  model  

How   to   get   value   of   unknown   point   based   on   semivariogram   function   and   measurement   points?   That   is   to   find   the   related   points,   and   calculate   the   value   of   unknown  point  according  to  related  points  weights  to  the  unknown  point.  Here  is  an   example   to   illustrate.   There   are   three   points   1,   2,   3,   and   a   point   0   needs   to   be   predicted.  We  need  to  calculate  the  value  of  point  0  according  to  following  equations,   and  ensure  the  minimum  error  estimation.  In  the  expression,  ߛ൫݄௜௝൯  represents  the  

semivariance  of  point  i  and  point  j;  ߣ  represents  the  Lagrange  multiplier.  Expressions   (5.4)  show  how  to  calculate  weights.  

(32)

26 The  weight  of  point  1,  2,  3  to  point  0,  marked  as  ™ଵ,™ଶ,™ଷ,  are  obtained  based  on  

expressions  (5.4).  The  value  of  point  0  can  be  calculated  by  substituting  ™ଵ,™ଶ,™ଷ  into  

the  formula  (5.5).  

ܼ଴ ൌ ݓଵܼଵ ൅ ݓଶܼଶ൅ ݓଷܼଷ (5.5)

There  are  several  types  of  Kriging,  Ordinary  Kriging,  Simple  Kriging,  Universal  Kriging,   Co-­‐Kriging,   Logistic   Normal   Kriging,   Indicator   Kriging,   Probability   Kriging,   and   Disjunctive  Kriging  and  so  on.  This  paper  uses  ordinary  kriging  method,  since  it  is  the   most   widely   used   kriging   method.   Ordinary   kriging   assumes   the   average   value   is   constant  and  unknown.  If  it  there  is  no  scientific  basis  against  this  assumption,  then  it   is  reasonable  [43].  Other  kriging  methods  will  not  be  discussed  in  this  paper.  If  you  are   interested   in   Kriging,   please   refer   to   the   related   citations   in   the   reference.   In   this   paper,  ordinary  kriging  is  an  open  source  code,  implemented  in  Matlab,  by  Wolfgang   Schwanghart.  Prediction  of  Kriging  for  Perlin  noise  in  scale  1  and  scale  4  are  shown  in   Figure  5.11  and  Figure  5.12  respectively.  

(33)

27

Figure  5.12  Prediction  of  Ordinary  Kriging  for  Perlin  noise  in  scale  4  

(34)

28

Chapter  6.  Evaluation  of  prediction  methods  

In   order  to   evaluate  the   performances   of   different   prediction   methods,   the   system   prototype  uses  grayscale  image  and  Root  Mean  Square  Error  (RMSE)  to  compare  the   difference  between  true  and  predicted  spatial  data  maps.  The  two  methods  evaluate   the  performances  of  different  prediction  methods  from  rough  visual  aspect  to  precise   quantitative  analysis.  

6.1  Grayscale  image  

In  computer  field,  grayscale  image  refers  to  images  in  which  each  pixel  only  has  one   scalar  value.  This  type  of  images  shows  the  color  from  black  to  white  and  gradient  grey   between  them.  The  color  and  brightness  of  objects  are  all  expressed  through  different   grey  level.  Grayscale  image  is  different  from  black  and  white  images.  In  computer  field,   black  and  white  images  only  have  black  and  white  two  colors.  Except  black  and  white,   grayscale   image   has   different   degrees   of   grey   between   black   and   white.   These   different  degrees  of  grey  colors  can  be  represented  by  grayscale  value.  The  range  of   grayscale  value  is  from  0  to  255,  0  represents  black  and  255  represents  white.  From   black  to  white,  there  are  256  gradients  of  grey  [44].  

 

In  order  to  visually,  quickly,  and  roughly  compare  prediction  methods,  the  simulation   generates  grayscale  images  of  the  true  Perlin  noise  generation  and  the  predicted  one.   Values  of  pixels  on  Perlin  noise  generations  are  real  number  in  the  range  of  [-­‐1,  1],   however,  values  of  pixels  on  grayscale  images  are  integer  in  the  range  of  [0,255].  In   order  to  show  Perlin  noise  image  in  grayscale,  the  values  of  Perlin  noise  are  converted   into  required  range.  Given  the  value  of  one  pixel  on  Perlin  noise  r,  then  the  grayscale   value  of  this  pixel  is  the  round  up  value  of  (r  +  1)  *  127.5.  Figure  6.1  shows  grayscale   images  of  Perlin  noise  in  scale  1  and  scale  4  respectively.  

 

(35)

29

6.2  RMSE  

Root  mean  square  error  (RMSE)  is  a  kind  of  method  to  measure  error  between  the   predicted  values  and  actual  observed  values  [45].  RMSE  is  sensitive  to  the  very  large   or  small  error  in  a  set  of  measurements,  so  the  root  mean  square  error  can  reflect  the   precision   of   the   measurement   [46].   RMSE   is   very   suitable   for   evaluating   results   of   different   prediction   models   with   the   same   parameters   setting   under   multiple   experiments.   However,   it   is   not   suitable   for   the   same   model   with   different   parameters.   Formula   (6.1)   shows   the   formula   of   RMSE,   in   which  ݕ̰௝is   the   predicted  

value,  and  ݕ௝  is  the  observed  value,  n  is  the  number  of  experiments.    

 

           (6.1)    

In   the   simulation,   the   error   of   each   pixel   on   the   true   map   and   predicted   map   are   calculated,  the  n  in  formula  (5.1)  is  the  number  of  pixels.  Since  the  value  range  of  each   pixel  on  Perlin  noise  is  [-­‐1,  1],  the  range  of  RMSE  of  true  map  and  predicted  map  is  [0,   2].  0  represents  the  true  Perlin  noise  map  and  the  predicted  one  is  exactly  the  same.   The  smaller  the  RMSE,  the  better  the  accuracy  achieved.  

(36)

30

Chapter  7.  System  prototype  

7.1  System  prototype  

The  experiment  is  divided  into  three  procedures,  sampling,  prediction  and  algorithm   evaluation.   The   sampling   procedure   applies   the   ONE   simulator   to   simulate   manual   data  collection,  and  uses  Perlin  noise  generations  to  simulate  the  spatial  data  map  for   sampling.  After  that,  according  to  the  setting  parameters,  using  participatory  sensing   data  is  achieved.  In  the  procedure  of  prediction,  three  prediction  methods  described   in  chapter  4  are  implemented  to  generate  the  whole  prediction  map  of  Perlin  noise   from  the  discrete  samples.  The  algorithm  evaluation  procedure  evaluates  the  three   prediction   methods   based   on   the   RMSE   between   the   ground   truth   Perlin   noise   generation  and  the  predicted  Perlin  noise  generation.    

 

This  paper  implemented  a  java-­‐based  prototype  system  to  achieve  sampling  the  data,   generating  predicted  results  and  evaluating  prediction  methods  with  a  computer.  In   addition,   the   different   settings   of   parameters   are   running   in   batch   mode   with   the   system  prototype.  As  long  as  the  parameters  are  inputted,  the  predicted  results  and   RMSE  will  be  generated  automatically,  as  shown  in  Figure  7.1.  

 

  Figure  7.1  System  prototype  design  

7.1.1  System  prototype  development  environment  

The  Java  programming  language  is  an  object-­‐oriented,  simple  and  type-­‐safe  computer   programming  language,  this  language  is  generic,  efficient,  and  portable.  Java  provides   powerful  framework  and  libraries  to  support  the  system  development.  Besides  that,   this  paper  uses  the  ONE  opportunistic  network  simulator  that  is  an  open-­‐source  tool   based  on  Java,  thus,  using  java  is  the  ideal  choice  for  developing  the  system.    

 

(37)

31

7.1.2  System  prototype  structure  

Our   system   prototype   is   composed   of   5   modules,   sampling,   prediction,   algorithm   evaluation,  user  interface  and  main  frame.    

 

Sampling   and   prediction   modules   are   two   completely   independent   system   prototypes.   Sampling   module   uses   opportunistic   network   to   simulate   manually   sampling   data   from   the   true   map   (Perlin   noise   generation),   and   save   samples͛   coordinate,  spatial  data  values  into  a  text  file.  

 

The  predicted  module  reads  the  coordinate  and  spatial  data  value  from  the  text  file   and  executes  the  selected  prediction  method(s)  to  generate  the  predicted  value  of   whole  geographic  area.  

 

Evaluation   module   calculates   the   whole   area͛s   RMSE   and   generates   the   grayscale   predicted  map  according  to  true  map  and  predicted  map.    

 

To  configure  all  input  parameters  and  choose  different  algorithms  in  one  time,  the   user  interface  integrates  the  parameters  that  are  required  form  different  modules.    

Finally,  the  prototype  system  will  configure  ONE  simulator  and  Perlin  noise  diagram   to   batch   generate   prediction   result,   and   store   the   predicted   result   according   to   different  type  of  input  parameters.  The  main  frame  is  the  bridge  among  the  other  four   modules,  which  is  responsible  for  data  transferring.  

 

The  reason  for  System  prototype  contains  two  independent  sub-­‐systems  is  to  avoid   messing  up  the  sampled  data  and  predicted  value  with  each  other,  and  ensure  the   data   are   trustable.   Besides,   the   two   independent   system   prototypes   promote   the   reusability  of  the  system.  

 

(38)

32

(39)

33

Figure  7.3  User  interface  of  system  prototype    

7.1.3  System  prototype  process  design  

(40)

34

References

Related documents

The contribution of this paper is three-fold: (1) we find that the mean of block sizes can divide all blocks into city blocks and field blocks; (2) based on this finding, we develop

Berrens et al. Note that the sign is inverted. Reduction in bias is calculated as the mean difference between the weighted and unweighted web survey estimate in relation to

Syftet med denna uppsats är att undersöka hur kommunikation i instrumentalundervisning kan förstås i förhållande till teorier om dialog och dialogisk undervisning samt om det finns

Malin Green-Landell, Andreas Björklind, Maria Tillfors, Tomas Furmark, Carl Göran Svedin and Gerhard Andersson, Evaluation of the psychometric properties of a modified version of the

Drawing on a survey (N 2,291) con- ducted in Sweden, the article demonstrates statistically significant results that women as well as parents with children at home are more likely

Den som studerat äldre (och för den delen ofta också yngre) svenska bokauktionskataloger inser till fullo, vilka svårigheter Lindström haft att övervinna i form

Although a lot of research on gender mainstreaming in higher education is being done, we know little about how university teachers reflect on gender policies and their own role when

Tommie Lundqvist, Historieämnets historia: Recension av Sven Liljas Historia i tiden, Studentlitteraur, Lund 1989, Kronos : historia i skola och samhälle, 1989, Nr.2, s..