• No results found

Inputing environmental values for all of Sweden

N/A
N/A
Protected

Academic year: 2022

Share "Inputing environmental values for all of Sweden"

Copied!
39
0
0

Loading.... (view fulltext now)

Full text

(1)

Daniel Haavisto

Inputing environmental values for all of Sweden

BACHELOR'S THESIS

Högskoleingenjörsprogrammet Institutionen för Samhällsbyggnadsteknik Avdelningen för GIS-utbildningen i Kiruna

2000:11 • ISSN: 1404-5494 • ISRN: LTU-HIP-EX--00/11--SE

(2)

for all of Sweden

Daniel Haavisto GIS-3

GIS-engineer program, Kiruna Luleå university of technology

Document: Report

Date: 1999-11-11

(3)

Table of contents

1 INTRODUCTION... 4

1.1 DATA COLLECTION...4

1.2 MODEL DEVELOPMENT...4

1.3 DATA INTERPOLATION...4

2 HEAVY METALS ... 5

2.1 LEAD...5

2.2 CADMIUM...6

3 METHODS FOR SPATIAL DATA INTERPOLATION ... 7

3.1 IDW ...7

3.2 SPLINE...7

4 NEURAL NETWORK ANALYSIS... 9

4.1.1 Building a multilayer perceptron network with NeuroSolutions. ...10

4.1.2 Cross validation and testing set...10

4.1.3 Hidden layers and processing elements ...11

4.1.3 Supervised learning...11

4.1.4 Probe Configuration ...11

5 MODEL DEVELOPMENT...12

5.1 URBAN HOTSPOTS (POPULATION)... 12

5.2 INDUSTRIAL EMISSIONS (EMPLOYMENT AND SECTORAL EMPLOYMENT) ... 12

5.3 TRANSPORTATION CORRIDORS... 13

NORTH-SOUTH DIFFERENCE... 14

6 DATA AND RESEARCH DESIGN...15

6.1 THE MOSS SAMPLES... 15

6.2 TOPSWING DATABASE... 16

6.2.1 Population ...16

6.2.2 Employment ...16

6.2.3 Employment in polluting industries ...16

6.3 GEOGRAPHICAL DATA... 16

6.3.1 Coastlines ...16

6.3.2 National borders ...16

6.3.3 Cities ...16

6.3.4 Roads ...16

6.3.5 Pollution values ...16

6.4 CONSTRUCTION OF USEFUL VARIABLES... 17

7 IMPLEMENTATION AND RESULTS ...18

7.1 TRAINING A NEURAL NETWORK... 18

7.1.1 Training results...19

7.2 INTERPOLATE WITH ARCVIEW... 19

8 SUMMARY...23

8.1 ABOUT NEURAL NETWORKS... 23

8.2 ABOUT ARCVIEW... 23

8.3 THE PROJECT IN GENERAL... 24

9 CONCLUSIONS ...25

(4)

Abstract

The goal of this project is to estimate pollution values lead and cadmium in 100 metre squares throughout Sweden. Since actual mesurements only are avalible for a limited number of colletion stations, natural trends (e.g proximity to transportations corridors) and the measured pollution values from the collection stations are feed in to a neural network. A neural network is capable to find the relationships between the hypothesis choosen (natural trends) and the measured pollution values. With the relationship(estimations for 100 metre squares) known, a grid could be created and presented in a GIS.

Appendix

1. Swemoss.95 2. Sql queries 3. Sni codes 4. Geoweight.cpp

5. ASCII grid file format and grid_to_tbl.cpp 6. NS-infile

Preface

This report is a part of the exam work I did at SMC (Spatial Modelling Centre – Kiruna).

Thanks to:

My supervisor Terance J Rephann, who has helped me with the direction and outlines of this project.

Anders Forsberg for providing me with geographical data.

The staff at SMC

(5)

1 Introduction

Up to this day we do not have satisfactory information concerning the geographical variation in environmental quality in Sweden. This means that is hard to supervise and predict local environmental changes, since the exact mechanisms for local pollution emissions are yet to be found. The goal of this project is to find a way to estimate pollution values for Lead and Cadmium in 100m squares throughout Sweden. To accomplish this a model (natural trends) has to be developed to describe the pollution value from other variables than actual measurements. Since actual measurements only are available for a limited number of collection stations (the moss stations chapter 6.1).

This project consists of three major steps. Data collection: data that describes the principles of the hypotheses must be created or collected for each moss station as well for each 100 square. Model development: a neural network will be used to model the relationships between human action and environmental change. Data interpolation: with the model, pollution data for each square could be calculated and imported into a GIS as a grid theme. If it is not possible to find a satisfactory model, a GIS (Arcview) could be used to interpolate pollution values for all of Sweden from the available moss stations as a point theme.

1.1 Data collection

Research has to be done to find what causes pollution of lead and cadmium. Data that captures the fundamental principles at work for the different types of pollution has to be found for each 100-m square (e.g. road proximity and population density). These variables are to be collected/created from the TOPSWING database and from the limited geographical data available using a GIS software. This represents a limitation to the project since the variables have to be collected from a limited set of data.

1.2 Model development

The problem of data fitting is that real world data tends to be very noisy and the exact mechanisms that generate the pollution of lead and cadmium are unknown. The importance of inferring a model from the collected data is to apply mathematical reasoning to the problem. This can be done with neural network software such as NerualSolutions (1), if the variables selected captures the fundamental principles at work. A neural network is an adaptable system that can learn relationships through repeated presentation of data.

1.3 Data interpolation

The estimation of pollution values for each 100 m squares is to be presented in GIS as a grid theme. If the task of estimating the values with a neural network is successful, the data must be arranged in a certain way and imported into ArcView as an ASCII grid file. If it is not possible to estimate values with a neural network, a grid could be interpolated using the spatial functions available in ArcView.

(6)

2 Heavy metals

Metals are liberated naturally from bedrock and soil. Living organisms have adapted to the natural background concentration of metals, and these metals are often essential to organisms and biological processes. But metals in the wrong concentration are poisonous. Man’s exploitation of the earth’s metal resources increases the concentration of metals in the environment to a toxic level. (2)

A substance that is acutely or chronically toxic to biological process constitutes a risk. If it is also slow degrading and has a tendency to bioaccumulate, it represents a threat to the environment. Several toxic metals have these characteristics since they are elements and therefore cannot be broken down into any simpler substance. Among the most important metals in this aspect are lead and cadmium. (2)

This project is focused on lead and cadmium because of their toxic characteristics and preliminary research conducted by the SMC research team which showed that these pollution levels are highly correlated with population density.

2.1 Lead

Lead, Pb Lead is a naturally occurring bluish-gray metal. The average lead content in the earth’s crust is low, but it is easily purified from minerals mainly galena. Lead ores are often found in combination with other metals such as zink, copper and cadmium. Lead itself does not break down, but sunlight, air and water change lead compounds. When released to the air from industry or burning of fossil fuels or waste, it stays in air about 10 days. Most of the lead in soil comes from particles falling out of the air.

City soils also contain lead from landfills and leaded paint. Lead sticks to soil particles. (3)

Lead is highly toxic and has no beneficial effects; inhaled or ingested lead concentrates in the blood, tissues and bones. Lead ions inhibit enzymes that catalyze the reactions for biosynthesis of hemoglobin thus led poisoning causes anemia. Exposure to lead is more dangerous for young and unborn children.

Unborn children can be exposed to lead through their mothers. Harmful effects include premature births, smaller babies, decreased mental ability in the infant, learning difficulties, and reduced growth in young children. In adults, lead may decrease reaction time, cause weakness in fingers, wrists, or ankles, and possibly affect the memory. (3)

(7)

2.2 Cadmium

Cadmium is a soft silver-white metal, but this form is not common in the environment. Rather, cadmium is most often encountered in combination with other elements such as oxygen (cadmium oxide), chlorine (cadmium chloride), or sulfur (cadmium sulfide). These compounds are all stable solids that do not evaporate, although cadmium oxide is often found as part of small particles present in air.

Most cadmium is obtained as a by-product from the melting of zinc, lead, or copper ores. (3) Cadmium has a number of industrial applications, but it is used mostly in metal plating, pigments, batteries, and plastics. Small quantities of cadmium occur naturally in air, water, soil, and food. For most people, food is the primary source of cadmium exposure, since food materials tend to take up and retain cadmium. For example, plants take up cadmium from soil, fish take up cadmium from water, and so on. The largest source of cadmium release to the general environment is the burning of fossil fuels (such as coal or oil). Cadmium may also escape into the air from zinc, lead, or copper smelters.

Working in or living close to a major source of airborne emissions such as these may result in higher- than-average exposure. Smoking is another important source of cadmium. Like most plants, tobacco contains cadmium, some of which is inhaled in cigarette smoke. Most people who smoke have about twice as much cadmium in their bodies as do nonsmokers. (3)

Cadmium is not known to have any beneficial effects, but can cause a number of adverse health effects.

Ingestion of high doses causes severe irritation to the stomach, leading to vomiting and diarrhea, inhalation of high dose leads to severe irritation of the lungs. (3)

(8)

3 Methods for spatial data interpolation

A surface containing estimated pollution values in every 100m square over all of Sweden could be interpolated from the limited number of collection stations (the moss stations) that measure the actual pollution values, using the features available in ArcView. The ArcView extension spatial analyst has a function to Interpolate a value for a cell, using the surrounding points in a point theme, thus creating a grid theme with the estimated values. There are two methods IDW (Inverse Distance Weighted) and spline available in this software package.

3.1 IDW

Gives values to each cell in the output grid theme by weighting the value of each point by the distance that the point is from the cell being analyzed (p)and then averaging the values. The number of points weighted (a) could be controlled in two ways, number of neighboring points to the cell being analyzed, or within a distance of the cell being analyzed. The exponent (n) of distance (d) used in the calculation controls the significance of surrounding points on the value given to the cell being analyzed. A higher power results in less influence from distant points.

?

?

?

a

i n i

d p p

1

Barriers could be used to limit certain points from being used in the calculation of a new value for a cell, even if the point is one of the nearest neighbors or within a certain distance. Barriers can be any line theme (e.g. roads). Figure 3.1 shows a grid interpolated with the IDW method (3 points and the exponent 2). (4)

3.2 Spline

Fits a curve through the points around the cell being analyzed and gives the cell the value of the curve at that location. The type of curve fitting can be (1) regularized, which creates a smoother flowing surface, or (2) tension, which results in a more rough surface that tightly conforms to the input data points.

Weight controls the tautness of the curves. As the weight increases with the regularized type, the output surface will become increasingly smoother. With the tension type, increasing the weight will cause the surface to become stiffer, eventually conforming closely to the input points. The number of points that will be used to fit the curve could be controlled. Figure 3.2 shows a grid interpolated with the spline method (3 points and the weight 0,1). (4)

(9)

Figure 3.1. Surface interpolated with the IDW method (3 points and the exponent 2).

Figure 3.2. Surface interpolated with the spline method (3 points and the weight 0,1)

(10)

4 Neural Network Analysis

Neural networks are used in many different areas such as medical and economic research. The major advantage of a neural network is its use in solving real world problems. When the exact mechanism of causation is not known, when the calculations are difficult and time consuming, when the correlation between the input data and the output data is known but the exact equations are not, and when the data are so noisy that ordinary calculation methods are insufficient to find the relationships, neural networks can be advantageous.

A neural network is an adaptable system that can learn relationships through repeated presentation of data. In a neural network a set of inputs and corresponding desired outputs could be used to try to learn the input-output relationships. The commonly used neural network architecture is the multilayer perceptron.

Neural networks are built from a large number of very simple processing elements that individually deal with pieces of a big problem. A processing element simply multiplies an input by a set of weights, and a nonlinearity transforms the result into an output value. (6)

Normally, a neural network will have one or several layers of processing elements (hidden layers). It is the nonlinearly transfer function of the hidden layers that give a neural network its ability to learn difficult problems. In NeuroSolutions there are several transfer functions (Axons) available where the hyperbolic tangent (TanhAxon), is the most common one. The input data is scaled and shifted to fit the range of the Axon (i.e., –1 to 1 for TanhAxon) and then the output is calculated with the current weights. Then an algorithm called backpropagation (learning rule) is used to adjust the weights a small amount at a time in a way that reduces the error. The network is trained by repeating this process many times.

The goal of the training is to reach an optimal solution based on the performance measurement. The output of the network is compared with a desired response to produce an error. (6)

(11)

Figure 4.1. simple multilayer perceptron

The picture shows a simple multilayer perceptron. The circles are the processing elements arranged in layers. The left row is the input layer, the middle row is the hidden layer, and the right row is the output layer. The lines represent weighted connections between processing elements.

4.1.1 Building a multilayer perceptron network with NeuroSolutions.

The first step in building a neural network is choosing a neural model, in this case the Multilayer Percepton. The next step is to select and arrange training data. Training data must be arranged in columns and saved in ASCII file format. Each column is to be tagged as either input, desired response of the network, or the column should be skipped entirely. (5)

4.1.2 Cross validation and testing set

Neural networks can be over trained, over training results in a network that memorizes the individual values, rather then trends in the data. To prevent over training a part of the data set could be used for cross-validation. Cross-validation monitors the error on an independent set of data and stops training when this error begins to increase. Once a network is trained the weights are frozen and the test set is fed through the network and the output is compared with the desired output. (5)

(12)

4.1.3 Hidden layers and processing elements

It has been shown that a single hidden layer multilayer percepton can learn any desired continuous input-output mapping if there are a sufficient numbers of processing elements. But at least two layers have been shown to be preferable. The number of processing elements should be based on the complexity of the problem, which in reality only can be determined experimentally. The transfer function should be chosen for each layer (for most problems the TanhAxon). A learning rule is used to calculate the weight update. NeuroSolutions offers four different learning rules where momentum is the recommended one. In momentum learning the past increment to the weights is utilized to speed up and stabilize convergence. (5)

4.1.3 Supervised learning

A network is terminated by either: (1) the maximum number of Epochs (i.e., the maximum number of times the data set should be processed), or (2) by different termination criterias, such as mean square error (i.e., a threshold value of mean square error is set).

4.1.4 Probe Configuration

NeuroSolutions has a number of tools (probes) for visualizing the data flowing through the network.

These include: MegaScope, a oscilloscope type plot over time, MatrixViewer, a standard numerical display, and Datawriter, which accumulates the data over time in a separate window and can be saved a text file.

Figure 4.1

DataWriter, with actual and desired

output Hidden layer with

TanhAxon transfer function

Input layer

Output layer with TanhAxon transfer function

Backpropagation (learning rule) Momentum

Megascope plotting mean square error

(13)

5 Model development

In order to get the neural network to train properly so that it can be used to estimate the pollution values for lead and cadmium, explanatory variables had to be found. These were the variables used in the process of finding a model with NeuroSolutions

5.1 Urban hotspots (population)

The size and density of the population is highly correlated with the emission of toxic heavy metals.

Figure 5.1 shows a plot between the pollution values at the moss stations and a weighted (see chapter 6) value of population density for each station. (2)

Figure 5.1.

5.2 Industrial emissions (employment and sectoral employment)

The emission of heavy metals occur mainly in the production stage. Industries that produce and/or consume products with heavy metals (e.g. mining) are naturally responsible for a large emission of these metals. Employment in Other industries (e.g. the service sector) could be seen as a complement to the population variable since even these type of industries consume products with heavy metals content in production and thereby pollute. Figures 5.2.1 and 5.2.2 show a plot of the pollution values at the moss stations versus a weighted (see chapter 6) value of employment (5.2.1 Employment in polluting industries and 5.2.2 employment in general). (2)

Figure 5.2.1 plot of the pollution values at the moss stations versus a weighted value of employment in polluting industries Figure 5.2.2 plot of the pollution values at the moss stations versus a weighted value of employment

0 2 4 6 8 10 12 14 16 18 20

0.14 7.29 17.5 32.9 60.9 94.1 143 209 274 370 464 618 1015 2514

Population density

Pollution value

Pb

Linear (Pb)

0 2 4 6 8 10 12 14 16 18 20

0 0 0.02 0.08 0.22 0.4 0.74 1.3 1.96 2.85 4.05 6.28 10.9 42.8

Employes in polluting industries

Pollution value

Pb

Linear (Pb)

0 2 4 6 8 10 12 14 16 18 20

0 1.44 4.43 9.83 19.4 29.4 44.7 64.3 86.1 111 155 224 339 122

Employment density

Pollution value

Pb

Linear (Pb)

(14)

5.3 Transportation corridors

Many studies have shown that the concentration of heavy metals in soil are greater in the proximity of highways. So the distance to mayor highways are highly correlated with contamination of heavy metals.

Figure 5.3 shows a plot of the pollution values at the moss stations versus the distance to the nearest major road.

(15)

Figure 5.3 plot of the pollution values at the moss stations versus the distance to the nearest major road

5.4 North-south difference

In Sweden approximately 70 per cent of atmospheric metal deposition currently emanates largely from other countries, mainly in Europe. So the emissions from Europe should decrease toward the north.

Picture 5.4 shows a plot of lead pollution values and south-north coordinate. (2)

Figure 5.4 plot of lead pollution values and south-north coordinate

0 2 4 6 8 10 12 14 16 18 20

0 300 500 728 1044 1300 1612 2025 2421 2912 3700 5033 7803 14174

Distance to mayor road

Pollution value

Pb

Linear (Pb)

0 2 4 6 8 10 12 14 16 18 20

6155500 6260500 6331700 6400500 6470800 6541300 6624100 6690500 6790500 6895500 7050500 7167700 7327000 7506500

South -> North

Pollution value

Pb

Linear (Pb)

(16)

6 Data and research design

6.1 The Moss samples

Studies shows that a survey of metal concentration in mosses is a valuable means of identifying sources of airborne pollution and a method for mapping the metal deposition. Every five years, moss samples are collected by the Swedish environmental research institute on behalf of the Swedish Environmental protection agency. And the pollution values for a number of heavy metals are measured. Figure 6.1 shows a map of the locations of collection stations (moss stations) in 1995. Collection in the near vicinity of known pollution sources was avoided. The sampling sites were located at least 300 m from a major road and at least 100m from any road or single house. (2)

For this project a file containing the moss survey of 1995 was used for the pollution values (a small part of the file swemos95.xls is appended in appendix 1).

Figure 6.1 locations of collection stations (moss stations) in Sweden 1995

(17)

6.2 TOPSWING database

All data except the roads, borders, coastlines, lakes where extracted from the TOPSWING database. The TOPSWING database, located at SMC in Kiruna, contains data from SCB (Statistical Sweden) that was collated from various tax, census, and social insurance registers. The following data was extracted with the coordinates for each 100 m square

6.2.1 Population

The number of people in each 100 m square. (see appendix 2 for SQL query)

6.2.2 Employment

The number of employed people in each 100 m square. (see appendix 2 for SQL query)

6.2.3 Employment in polluting industries

The number of employed in polluting industries in each 100 m square. (see appendix 2 for SQL query).

This was accomplished with the SNI codes (Swedish standard industrial classification). The polluting industries were identified and the SNIs collected. (7)(8) (see appendix 3 for the polluting industries and their SNIs).

6.3 Geographical data

All the Geographical data was extracted from GSD-rödakartan except the pollution values, the themes used in this project was.

6.3.1 Coastlines

The Swedish coast, for presentation only.

6.3.2 National borders

Sweden National border, for presentation only.

6.3.3 Cities

For presentation only.

6.3.4 Roads

All roads with a width of at least 5 m were extracted.

6.3.5 Pollution values

Pollution values were imported as a dbf file from swemos95.xls.

(18)

6.4 Construction of useful variables

To train and estimate with a neural network, two sets of data are necessary. One for training (i.e., a file with all the variables and the measured pollution value for each moss station), and one for estimating pollution values for all of Sweden (i.e., a file with all variables for each 100 m square in Sweden).

All variables except the road distance were weighted for each square by the inverse of the distance squared (much like the IDW method see chapter 3.3) within a radius from the point (e.g. moss station) being calculated. This was accomplished with a C++ program (see appendix 4), that takes a coordinate file (e.g., a file with the coordinates for all the moss stations), a value file (e.g. a file with the

coordinates and the population value for each square) and then produces a output file with the coordinates from the coordinate file and the calculated weighted value.

The road distance variable was created with ArcView. The road distance for each square in Sweden is easily done. ArcView has a function called “find distance,” that calculates the distance for each cell in a user defined grid from the active theme, grid or feature.

To get the road distance for each moss station the point theme with the moss stations must be converted to a grid with the cell value of 1. Then the cell value from the grid created with the find distance function is multiplied with the grid created from the moss stations, thus creating a new grid with the road distance value for each moss station.

Once a grid is created it is exported as an ASCII grid file (see appendix 5) and then converted to a table with X Y coordinates and the distance value, this is done with a C++ program (see appendix 5)

When all the variables are created and calculated, an input file for NeuroSolutions could be arranged (see appendix 6)

(19)

7 Implementation and results

7.1 Training a neural network

In this project an educator version of NeuroSolutions was used, with the limitations of maximum 2 hidden layers and 512 processing elements.

The success of the training is judged by the mean square error (e.g., for a training curve plotted in the MegaScope the curve should approach zero). If or when the network trains effectively the data (infile) is removed with backpropagation and the file for all of Sweden is fed through the network to get the estimations. Figure 7.1 shows the training curve for networks with a Multilayer percepetron network architecture using one hidden layer, the learning rule TanhAxon, and different numbers of processing elements. Figure 7.1.2 is the same as 7.1 but with 2 hidden layers.

Figure 7.1 traning curve for networks with one hidden layer

Processing elements: 5 Processing elements: 20

Processing elements: 55 Processing elements: 100

(20)

Figure 7.1.2 traning curve for networks with to hidden layer

Processing elements layer 1: 10 Processing elements layer 1: 20 Processing elements layer 2: 5 Processing elements layer 2: 10

7.1.1 Training results

During the efforts to train the network it has been shown By studying the output tables and training curves that the network is not powerful enough to train to a satisfactory level. The different types of networks train to approximately the same level both for lead and cadmium. When other neural architectures were used the results where equally bad. Most likely the data is too noisy and the correlation between the variables and the pollution values is not enough to estimate satisfactory pollution values. Table 7.1.1.shows a sample of the desired output and actual output for the second network in figure 7.1. for lead.

Table 7.1.1

Desired Pb Actual output Pb

6.85 7.492055

5.36 7.710464

8.58 6.986342

11.6 7.215856

13.44 6.6897

10 6.63317

7.93 6.855217

13.89 6.582269

11.98 6.734963

9.91 6.887609

7.2 Interpolate with ArcView

Since the efforts to train the neural networks for lead and cadmium failed, ArcView is used to interpolate the surfaces for lead and cadmium. The interpolation method Spline is used because the moss stations are not points of emissions so the pollution values do not decrease by the distance from a moss station, as in the IDW method used. The grids where calculated with a grid size of 500 m not 100 m as first intended because the interpolation methods are very time and computer demanding. If the

(21)

surfaces calculated were to be used as estimates for the pollution values, an ASCII grid could be converted to table with the C++ program written (see appendix 5). Figure 7.2 shows a map with the calculated lead pollution values and 7.3 with the calculated cadmium values.

(22)

Figure 7.2 calculated lead levels Sweden

(23)

Figure 7.3 calculated cadmium levels Sweden

(24)

8 Summary

8.1 About neural networks

I was not able to find the relationships between the pollution values and the variables considered in this project using neural networks even though the relationships are well known and easily proven on a larger scale.

There are many possible reasons for this situation. foremost the moss data are noisy. Perhaps if more variables were used to explain the pollution values, a neural network would train to a satisfactory level.

One other reason could be that the moss stations are located at a minimum distance from any house and road and the near vicinity of known pollution sources was avoided (see section 6.1). It would have been better if the location of the moss stations were chosen randomly since they now are located away from possible emitters, so the impact of the variables except north-south difference is minimized.

Even if the network had trained properly, one might ask what the results would show. Can a neural net model be trusted? A neural network can find relationships but not explain them. It is basically a mathematical model that does not attempt to specify the precise mechanisms by which causal variables affect outcomes. However, it is potentially a useful tool in the social and physical sciences, especially in GIS applications.

8.2 About ArcView

The surfaces interpolated (see section 7.2) show that pollution levels are higher in the south of Sweden and in the vicinity of population centres. These results are in accordance with the hypotheses developed earlier in this presentation (involving the variables used in training the neural network). But the surfaces do not provide a good estimation of the pollution values since the only thing considered is the measured pollution values. However the maps could be seen as a quality measurement of the moss survey (the moss station are sufficient to show the environmental status in Sweden concerning heavy metals, since the maps are in accordance with the well known hypotheses e.g. vicinity of population centres). The quality of the interpolated surfaces could be determined, if a number of moss stations where removed from the set of data, and then the interpolated values for the position of the removed moss stations could be compared whit the measured values.

(25)

8.3 The project in general

During this project a lot of time has been a spent in processing large data files (extracting from the TOPSWING database, converting between file formats and weighing the values). This would have been a problem if the network had trained successfully. If the infile for NeuroSolutions over whole of Sweden had been in a ASCII format the size would have been over 2.3 Gbi and hence too big for NeuroSolutions and Microsoft Access. Of course, the file could be divided into several smaller files but this would have taken additional time.

If I were to continue this project I would want a more powerful computer. I was not able to interpolate a grid with a cell size of 100m over all of Sweden because of the limitations in computing power. More variables would been introduced such as annual precipitation and average temperature because of the role of physical processes as well as human activities on heavy metal deposition levels. Methods for determining the quality of interpolated surfaces would be developed. Also, I would spend more time researching heavy metals and neural networks.

(26)

9 Conclusions

The Neural Networks (NeuralSolutions) was not able to find the desired relationships on this set of data.

The interpolation of the surfaces in ArcView gave the desired result. The maps created are in accordance with the hypotheses, but the quality of the result is unknown.

A method for determining the quality of interpolatad surfaces must be developed.

(27)

REFERENCES

(1) Jose C. Principe, W. Curt Lefebvre, Neil R. Euliano. 1996. Neural Systems: Fundamentals through simulations.

(2) Abrahamsson, Kurt Viking. 1998. Heavy metals in sustainable Swedish society. SMC Kiruna . (3) ATSDR- Agency for Toxic Substances and Disease.

http://www.atsdr.cdc.gov

(4) ArcView help documentation.

(5) NeuroSolutions. Getting started manual.

(6) NeuroDimensions Inc.

http://www.nd.com

(7) Environmental protection agency.

http://www.epa.gov

(8) Western Maryland Collage.

http://www.wmc.car.md.us

(9) Swedish environmental research institute.

http://www.ivl.se/

(28)

Appendix 1

SweMoss95

Y_Koord X_Koord Art Cd Cr Cu Fe Ni Pb

1639500 6534500 P 0.217 0.791 2.911 196.73 1.1 7.244 1620500 6544500 P 0.329 0.868 4.958 281.73 1.519 8.09 1589500 6569100 P 0.245 1.294 3.628 344.58 1.425 4.677 1635900 6559500 P 0.521 1.037 4.874 270.28 1.817 7.77 1648700 6599500 P 0.189 0.926 3.225 212.98 1.131 6.004 1674100 6594500 P 0.232 0.892 5.101 215.54 1.225 7.638 1634500 6609100 P 0.193 0.806 5.379 206.954 1.012 4.124 1664500 6611300 H 0.204 0.851 5.826 287.852 1.32 5.617 1674500 6635900 P 0.252 0.828 3.696 263.991 1.408 6.896

1620500 6535900 P 0.254 0.597 6.47 291 2.11 10.1

1600900 6545500 P 0.343 0.788 4.65 228 1.58 7.53

1645500 6561300 P 0.274 0.537 6.17 147 1.21 6.91

1615500 6600500 P 0.234 1.34 9.15 723 1.95 9.61

1640900 6625500 P 0.235 0.834 3.91 107 0.91 4.48

1650500 6620900 P 0.2 2.33 5.56 1480 1.7 6.17

1645500 6660900 P 0.269 0.828 4.22 145 1.23 7.78

1655900 6656300 P 0.217 1.22 5.21 104 1.61 5.65

1564500 6613700 P 0.251 1.976 3.908 299.125 1.5 5.003 1573700 6639100 P 0.213 1.255 4.715 241.866 1.289 5.441 1629500 6639500 P 0.264 0.715 3.575 225.461 0.997 4.752 1584100 6659500 P 0.14 1.049 4.087 214.536 1.144 5.814 1589100 6669500 P 0.143 1.149 4.113 237.449 1.121 6.696 1599500 6685900 P 0.23 0.563 2.705 104.797 0.741 2.496 1644500 6654100 P 0.158 0.663 3.055 183.637 0.737 3.781 1634500 6673700 P 0.247 0.534 4.461 219.222 0.889 3.841 1649100 6684100 P 0.168 0.509 2.952 147.242 0.85 3.649 1599100 6725900 P 0.158 1.438 3.893 189.31 1.463 5.292 1624500 6700500 P 0.254 0.531 4.094 128.274 0.932 3.384 1614100 6720500 P 0.169 0.8 4.478 187.936 1.325 5.195

1586300 6640900 P 0.149 0.563 4.24 100 0.566 4.44

1610500 6640500 P 0.237 1.17 5.29 321 1.72 8.05

1626300 6670500 P 0.144 0.897 4.57 221 1.09 6.03

(29)

1651300 6695900 P 0.19 1.03 6.14 138 1.22 4.79 1539500 6519500 P 0.177 0.527 3.414 197.023 0.87 3.03 1539100 6533700 P 0.275 1.043 3.995 314.462 1.302 6.356 1529500 6539100 P 0.163 0.54 3.898 156.741 0.795 3.612 1564100 6510500 P 0.172 0.596 4.911 278.391 1.106 7.198 1574500 6539500 P 0.216 0.707 5.207 278.882 1.288 4.886 1568700 6549100 P 0.251 1.318 3.978 336.801 1.802 7.723 Al concentrations are expressed in ?g/g dry weight at 40? C

(30)

Appendix 2

Employment in each 100m square:

SELECT aCountyNo, aCommunityNo, sni92, oid,

OrganisationYearData.cfarnr, coordnorth, coordeast, count(*) num from PersonYearOccupation

JOIN OrganisationYearData on OrganisationYearData.oid=

PersonYearOccupation.orgnr and OrganisationYearData.cfarnr=

PersonYearOccupation.cfarnr and OrganisationYearData.CountyNo=

PersonYearOccupation.aCountyNo and OrganisationYearData.CommunityNo=

PersonYearOccupation.aCommunityNo where PersonYearOccupation.Year=1995

group by aCountyNo, aCommunityNo, sni92, oid, OrganisationYearData.cfarnr,coordnorth, coordeast

People in each 100m square:

SELECT CoordNorth, CoordEast, count(*) FROM SMCDATA..PersonYearCoord

JOIN SMCDATA..Person on SMCDATA..PersonYearCoord.PID=

SMCDATA..Person.PID where Year=1995

GROUP BY CoordNorth, CoordEast

Sectoral employment in each 100m square:

SELECT coordnorth, coordeast, count(*) num into SMCWORK..colldat2

(31)

from PersonYearOccupation JOIN OrganisationYearData on OrganisationYearData.oid=

PersonYearOccupation.orgnr and OrganisationYearData.cfarnr=

PersonYearOccupation.cfarnr and OrganisationYearData.CountyNo=

PersonYearOccupation.aCountyNo and OrganisationYearData.CommunityNo=

PersonYearOccupation.aCommunityNo where PersonYearOccupation.Year=1995 and (PersonYearOccupation.sni92=2221 or PersonYearOccupation.sni92=22210 or PersonYearOccupation.sni92=2222 or PersonYearOccupation.sni92=22221 group by coordnorth, coordeast

(32)

Appendix 3

Pb pulluters SNI´s

Industy Type

Commecial printing 2221 22210 2222 22221 22222

inorganic pigments 2412 24120

Cement,hydraulic 2651 26510

Ceramic wall and floor tile 263 2630 26300

Pottery products 262 2621 26210 2622 26220 2623 26230 2624

26240 2625 26250 2626 26260 263 2630

Blast furnaces and steel mills 27 271 2710 27100 2735 27350 2752 27520 Steel wire and related products 272 2722 27221 27222 273 2731 27310 2734

27340

cold finishing of steel shapes 2732 27320 2733 27330

Gray iron foudaries 2721 27210 275 2751 27510

Primary copper 2744 27440

Primary lead 2743 27430

Primary nonferrous metals 2745 27450

Secondary nonferrous metals 2754 27540 2753 27530

Electric equipment & supply 311 3110 31100 312 3120 31200 313 3130

Electric service 401 4010 40100

Refuse systems 900 9000 90001 90002 90003 90004 90005 90006

90007

National security 7522 75222 75223 75224 75225

Cd pulluters

Industy Type

Metal finishing 285 2851 28510

inorganic pigments 2412 24120

Batteries, cells 314 3140 31400

Electric equipment & supply 311 3110 31100 312 3120 31200 313 3130

(33)

Appendix 4

Geoweight.cpp

#include <iostream.h>

#include <fstream.h>

#include <math.h>

#include "klasser.h"

main() {

char namn[100],out[100],c;

int i=0,j=0,exp=0,radius;

double d;

cout << " Coordinate file?\n";

cin.getline(namn,100);

ifstream f(namn,ios::in);

raster r;

value v;

value res;

cout << " Value file?\n";

cin.getline(namn,100);

cout << " output file?\n";

cin.getline(out,100);

ofstream u(out,ios::out||ios::app);

cout << " distanse exponent?\n";

cin>>exp;

cout << " Radius from point?\n";

cin>>radius;

while(f.get(c)) {

f >> r.X >> r.Y;

(34)

res.X=r.X;

res.Y=r.Y;

res.num=0;

ifstream g(namn,ios::in);

while(g.get(c) && (v.X-r.X)<=radius+1) {

g >> v.X >> v.Y >> v.num;

if(v.X==r.X && v.Y==r.Y) res.num +=v.num;

else {

d=(sqrt(pow((r.X-v.X),2)+pow((r.Y-v.Y),2)));

if(d<=radius) res.num +=((v.num)/pow((d/1000),exp));

else

res.num += 0;

}

} g.close();

u<<res.X<<" " <<res.Y<<" "<<res.num <<

endl;

}

}

Klasser.h

(35)

class raster { public:

raster (){X=Y=0;};

int X,Y;

};

class value{

public:

value (){num=0;X=Y=0;};

int X, Y;

double num;

};

(36)

Appendix 5

ASCII grid file format:

ncols 2 nrows 2

xllcorner 6124900 yllcorner 1208300 cellsize 100 nodata_value -55 1 2 5 6

grid_to_tbl.cpp:

#include <iostream.h>

#include <fstream.h>

main() {

char namn[100],dummy[100],out[100];

int ncols=0,nrows=0,

xllcorner=0,yllcorner=0,cellsize=0,nodata=0,X,Y,c=0,r=0;

double value;

cout << " Grid file?\n";

cin.getline(namn,100);

ifstream f(namn,ios::in);

cout<<"outputfile\n";

cin.getline(out,100);

fstream g(out,ios::out);

f>> dummy; f>> ncols;

f>> dummy; f>> nrows;

f>> dummy; f>> xllcorner;

f>> dummy; f>> yllcorner;

f>> dummy; f>> cellsize;

f>> dummy; f>> nodata;

(37)

X=(nrows*cellsize)+yllcorner-cellsize;

while (X>=yllcorner) {

Y=xllcorner;

c=0;

while(Y<=(xllcorner+(ncols*cellsize)-cellsize)) {

f>>value;

if(value!=nodata)

//ignore nodata values g<<X<<" "<<Y<<"

"<<value<<"\n";

Y += cellsize;

c++;

} X -= cellsize;

r++;

}

if(r==nrows&&c==ncols) {

cout<<ncols <<" columns processed\n";

cout<<nrows <<" rows processed\n";

} else

cout <<" error creating table"<< c

<<" " << r;

}

(38)

Appendix 6

NS-infile:

X Y Peo_num Emp_num Poll_num Road_Dist Cd Pb

6155500 1348900 1322.72 571.14 11.64 761.57 0.25 6.85

6163300 1345800 2037.96 901.46 14.34 223.6 0.2 5.36

6163300 1370800 595.21 140.29 4.89 1236.93 0.18 8.58

6166400 1361400 1333.7 292.16 8.28 200 0.18 11.6

6175800 1383300 375.29 50.24 1.44 200 0.28 13.44

6176700 1391400 475.98 34.23 0.03 412.31 0.24 10

6205500 1348900 789.62 296.04 6.3 1280.63 0.44 7.93

6205500 1373900 550.66 83.51 1.97 1000 0.28 13.89

6211100 1388900 1291.48 71.49 4.44 538.51 0.26 11.98

6211400 1346700 625.85 160.28 7.91 2692.58 0.26 9.91

6215500 1378900 741.34 50.89 3.93 1063.02 0.24 5.64

6216400 1361400 493.93 73.6 4.8 640.31 0.2 9

6221400 1376400 1014.98 109.47 11.75 1676.31 0.31 11.4

6221700 1326700 1440.55 449.27 16.87 2334.52 0.2 8.86

6223300 1400800 800.86 49.7 3.11 2061.55 0.28 9.98

6225800 1458600 759.5 146.36 12.54 1220.66 0.17 6.2

6231100 1373900 2248.89 211.05 16.81 800 0.27 8.14

6231100 1448600 1033.36 249.03 12.6 223.6 0.25 7.77

6231100 1473600 1561.67 401.67 33.38 100 0.28 10.22

6231100 1498600 2217.48 238.62 10.92 2668.33 0.22 7.06

6233300 1405500 569.08 51.45 1.81 1389.24 0.21 6.56

6234500 1364500 634.49 52.99 6.46 1414.21 0.25 8.92

6234500 1414500 524.72 181.91 3.26 3104.84 0.25 12.9

6235500 1463900 1242.93 603.37 93.47 360.55 0.29 6.43

6235800 1313900 2055.92 372.47 48.32 100 0.31 5.71

6235800 1413900 481.68 187.53 3.05 3252.69 0.49 12.85

6236400 1346700 539.06 54.81 3.39 2353.72 0.32 8.4

6239200 1354200 376.55 36.26 2.82 761.57 0.26 7.66

6239200 1404500 436.05 43.75 1.01 538.51 0.36 10.2

6240500 1453900 448.12 111.26 11.43 1077.03 0.25 7.27

6240800 1403900 421.11 40.85 1.05 989.94 0.29 10.81

6241100 1503600 466.11 97.77 3.67 0 0.26 8.8

6241400 1336400 690.39 70.08 4.84 2325.94 0.21 7.34

6241400 1436700 608.72 204.79 3.15 100 0.31 11.2

(39)

References

Related documents

When the students have ubiquitous access to digital tools, they also have ubiquitous possibilities to take control over their learning processes (Bergström &amp; Mårell-Olsson,

Oksendal [21] (see also references therein) have developed a stochastic integral with respect to Fractional Brownian motion and a geometric Fractional Brownian mo- tion model

In terms of an error equation where the boundary data and forcing function are zero, this means that the time derivative of the norm of the error is zero.. With a nonzero

Det läsaren dock går miste om när denna intertextuella läsning av verket förbises eller inte uppfattas i tillräckligt stor utsträckning, är hur förståelsen för

It occurs in a wide range of woodland communities: alder woods (Alnion glutinosae with drier soil conditions later in the growing season, as opposed to permanently wet soils

Type of solution Direct operator billing (including payment in mobile phone bills) Service/ Payment scenario SMS payments, in-app payment (m-commerce), e-commerce Infrastructure

Då denna studie avhandlar om huruvida vi kan använda de nyligen upptäckta teknikerna för att framställa en självsanerande yta med hjälp av nanoteknik, vill jag i detta kapitel ge

Det gemensamma för alla tre informanter är att de betonar att det inte behöver handla om specifika lässvårigheter när eleverna inte vill läsa, utan använder beskrivningar som