Integrating geographic and information visualization for interactive exploration of statistical data

(1)

Department of Science and Technology Institutionen för teknik och naturvetenskap

Linköpings Universitet Linköpings Universitet

LITH-ITN-MT-EX--05/031--SE

Integrating geographic and

information visualization for

interactive exploration of

statistical data

Nina Feldt

Henrik Pettersson

(2)

LITH-ITN-MT-EX--05/031--SE

Integrating geographic and

information visualization for

interactive exploration of

statistical data

Examensarbete utfört i medieteknik

vid Linköpings Tekniska Högskola, Campus

Norrköping

Nina Feldt

Henrik Pettersson

Handledare Mikael Jern

Examinator Mikael Jern

Norrköping 2005-04-20

(3)

Rapporttyp Report category Examensarbete B-uppsats C-uppsats D-uppsats _ ________________ Språk Language Svenska/Swedish Engelska/English _ ________________ Titel Title Författare Author Sammanfattning Abstract ISBN _____________________________________________________ ISRN _________________________________________________________________ Serietitel och serienummer ISSN

Title of series, numbering ___________________________________

URL för elektronisk version

Institutionen för teknik och naturvetenskap Department of Science and Technology

2005-04-20

x

LITH-ITN-MT-EX--05/031--SE

http://www.ep.liu.se/exjobb/itn/2005/mt/031/

Integrating geographic and information visualization for interactive exploration of statistical data

Nina Feldt, Henrik Pettersson

The fast growing quantity of statistical data with geographical reference, accessible on the Web, creates a growing demand for more powerful exploratory analysis techniques for investigation of

socio-economic phenomena in association with their geographical location. Statistics Sweden provides large amounts of census data of economic, social, demographic, and other matters of interest to the governments, government departments, local authorities, businesses, and to the general public. The current techniques available to analyze these data sets are limited. Therefore, this diploma work was put together, with the intention of integrating methods used in geographic visualization and information visualization into a prototype tool for interactive exploration of statistical data. This report investigates the tools present today and describes the development of a new one called GeoWizard - a tailor-made easy-to-use application for Statistics Sweden's databases.

(4)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

(5)

visualization for interactive exploration of statistical

data

Nina Feldt, Henrik Pettersson

NVIS - Norrk¨oping Visualization and Interaction Studio Link¨oping University

ninfe258@student.liu.se, henpe309@student.liu.se April 2005

(6)

Abstract

The fast growing quantity of statistical data with geographical reference, accessible on the Web, creates a growing demand for more powerful exploratory analysis techniques for investigation of socio-economic phenomena in association with their geographical location. Statistics Sweden provides large amounts of census data of economic, social, demographic, and other matters of interest to the governments, government departments, local authorities, businesses, and to the general public. The current techniques available to analyze these data sets are limited. Therefore, this diploma work was put together, with the intention of integrating methods used in geographic visualization and information visualization into a prototype tool for interactive exploration of statistical data. This report investigates the tools present today and describes the development of a new one called GeoWizard - a tailor-made easy-to-use application for Statistics Sweden’s databases.

(7)

We would like to express our gratitude to the examiner, professor Mikael Jern at the Department of Science and Technology, Link¨oping University, for ideas lead-ing to this diploma work. He and our advisor, doctoral student Jimmy Johansson, have been an enormous help during the entire project, giving us instant feedback, questions and suggestions. Thanks also to Bo Justusson at Statistics Sweden for valuable feedback regarding the GeoWizard application.

Nina Feldt Henrik Pettersson April 2005

(8)

List of Figures

1.1 The PX-Web user interface . . . 3

1.2 The PX-Map application . . . 3

2.1 Snow’s map, displaying a cholera outbreak . . . 6

2.2 Minard’s visualization of Napoleon’s march . . . 7

3.1 The .NET Framework architecture . . . 12

3.2 The .NET execution process . . . 13

3.3 Visual Studio .NET . . . 14

3.4 Layout hierarchy in the Windows Forms Designer . . . 15

4.1 The GeoWizard application . . . 21

4.2 Concept of Parallel Coordinates Browser . . . 23

4.3 Layered component-oriented system architecture . . . 28

4.4 Retrieving data from the Statistics Sweden database. . . 33

4.5 Data flow of GeoWizard . . . 35

5.1 Colour map based on tax rate . . . 37

5.2 Threshold adjustments via the legend . . . 38

5.3 Filtering out low values with the range sliders . . . 38

5.4 Dynamic colour range . . . 39

5.5 Dynamic queries in the parallel coordinates browser . . . 39

5.6 Paint tool . . . 40

5.7 Bars in the map . . . 40

6.1 Features of GeoVISTA and GeoWizard compared. . . 44

6.2 Features of CommonGIS and GeoWizard compared. . . 46

6.3 Visualization of economic attributes in City’O’Scope . . . 48

6.4 Features of City’O’Scope and GeoWizard compared. . . 49

6.5 Kommundatabasen . . . 51

A.1 OpenViz scene tree diagram - choropleth map. . . 56

A.2 OpenViz scene tree diagram - parallel coordinates browser. . . 57

A.3 OpenViz scene tree diagram - scatter plot matrix. . . 58

(11)

Some common abbreviations and expressions used in this report:

GeoWizard The application developed during the diploma work

Infovis Information visualization

Geovis Geographic visualization

GUI Graphical User Interface

.NET An extensive framework for software development

PCB Parallel Coordinates Browser,

visualization component used in GeoWizard

Statistics Sweden Central government authority responsible for official statistics, known in Swedish as Statistiska Centralbyr˚an (SCB)

(12)

Chapter 1 Introduction

This report describes the master thesis Integrating geographic and information

vi-sualization for interactive exploration of statistical data. The thesis has been

per-formed as part of the Master of Science program in Media Technology and Engi-neering at Link¨oping University, Sweden.

1.1 Background

Rapid advances in geographic information systems have created a potential for dy-namic geovis to be integrated with exploratory infovis. The fast growing quantity of statistical data accessible on the Web creates possibilities for such integrated tools to be used in a wide range of application domains. Analysts can search for relationships, patterns and trends to gain understanding and knowledge of the data, without having any prior assumptions or theories. Although researchers have made substantial advances in both visualization fields over the past decade, many chal-lenges remain. As statistics data sets become increasingly large and complex, users require more effective multi-dimensional visualization tools and faster interactive performance to be able to understand them. This challenge demands improved fundamental methods for visual exploration analysis.

The Web is today’s primary medium for distribution of geospatial data and maps, but most web-enabled integrated infovis and geovis applications has limited interactive capabilities.

Statistics Sweden provides large amounts of census data of economic, social, demographic, and other matters of interest to governments, government depart-ments, local authorities, businesses, and the general public. For instance, popula-tion and economic census informapopula-tion is of great value in planning public services, such as education, fund allocation, public transport, as well as in private busi-nesses, for example, when placing new factories, shopping malls, or banks, as well as marketing particular products. Moreover, survey data on specific topics, such as labour force, time use, household budget, are regularly collected to keep updated information on some economic and social phenomena.

(13)

The practice of geographically referencing census data has increasingly spread over the last few decades and the techniques for attaching socio-economic data to specific locations have markedly improved at the same time. In Sweden, for instance, a large amount of the official data are provided for each municipality district. Computerized maps of these municipalities enable the investigation of socio-economic phenomena in association with their geographical location. These advances cause a growing demand for more powerful exploratory analysis tech-niques that can link population data to their spatial distribution.

1.1.1 Statistics Sweden’s current visualization methods [20]

PC-Axis is a system used by Statistics Sweden and other statistical offices to man-age statistical information. The PC-Axis system contains a number of specialized modules for different tasks; among them are PX-Web and PX-Map. PX-Web pow-ers the public statistical database on Statistics Sweden’s website [21] and PX-Map is a thematic mapping module for PC-Axis files developed by Statistics Norway [26].

When accessing Statistics Sweden’s databases with the PX-Web user interface (with regular HTML1-forms), the user can view the selected information as sym-bols in a map (see figure 1.1). The regions can be clicked, making a small table of information appear. The map navigation is not very easy to use, for example, zooming the regions around Stockholm takes some time. The interactivity needs to be improved, and methods for visualizing more than two or three attributes are desired.

PX-Map is a Windows application that can present statistical data in the PC-Axis format as either a choropleth map or a symbol map (see figure 1.2). The application supports basic user interactions such as panning, zooming and selection of regions. PX-Map can show only one attribute at the same time which limits its usefulness as a data exploration tool.

1.1.2 Related work

Visualization of geospatial data has been the subject of several research projects. The results are both extensive general systems and more specialized applications. Among the general systems are GeoVISTA Studio [11], an open source Java-based visual programming environment for geospatial visualizations, and CommonGIS [5], another Java based platform. Examples of specialized applications are City’O’-Scope [3] by Macrofocus [15], a tool for comparison of economic attributes of cities around the world, and Dynamaps [6], a tool for visualization of data from the U.S. Census Bureau. These systems and applications, with the exception of Dy-namaps, are compared with the GeoWizard application in section 6.2. The reason for excluding Dynamaps was that a functioning version of the application could not be found, and it would not be fair to make the comparison with static pictures.

(14)

1. Introduction

Figure 1.1: The PX-Web user interface

Figure 1.2: The PX-Map application

1.2 Objectives

The problem addressed was the supposed lack of interactive applications for mul-tivariate, abstract, and geographically referenced data. The aim of the project was to investigate how infovis and geovis techniques can be integrated in order to vi-sualize and interact with geographically referenced multivariate census data sets. Based on the investigation a prototype visualization tool should be developed, with hopefully, great potential in supporting good public policy and in underpinning the effective functioning of a democratic society. The objective was to create a easy-to-use exploratory tool, tailor-made for the databases of Statistics Sweden. Focus should be on innovative user interface technology as well as integration of already established components from previous work.

1.3 Methods

The diploma work consisted of two parts; a theoretical and a practical. In the for-mer, a literature study and an investigation of related applications were performed. The result of this part was a list of requirements of functions and a GUI design. In the practical implementation phase, an application named GeoWizard was devel-oped. During the process, some early requirements were set aside, while new ones were added. Refinements and changes was made after discussion with and feed-back from advisors. The application was evaluated, both by Statistics Sweden and the authors in a comparison with similar commercial and academic applications.

(15)

1.4 Target audience

This report is intended for an audience with a technical background and interest in infovis. The section describing tools and technologies, chapter 3, is more technical than the rest of the report. It is mainly intended for people interested in the actual implementation of GeoWizard or in software development in general.

1.5 Structure of report

The report is structured as follows. Chapter 2, Data visualization, presents the background of data visualization and introduces techniques within it. Chapter 3,

Tools and technologies, describes and discusses the tools and technologies used in

the development of the application. In chapter 4, GeoWizard, the requirements, system architecture, graphical user interface, and functions of the application are explained and illustrated. In chapter 5, Case study, a hypothetical use for the ap-plication is demonstrated. The apap-plication is evaluated by Statistics Sweden and compared with similar systems in chapter 6, Evaluation. Chapter 7, Conclusions

and future work, describes the authors’ conclusions as well as suggestions for

fu-ture improvements. There are also three appendices; Appendix A describes how the visualization techniques were implemented, Appendix B contains the presentation document sent to Statistics Sweden, and finally Appendix C contains the evaluation by Statistics Sweden.

(16)

Chapter 2 Data visualization

Visual information plays an important role when interacting with the world. With-out the ability to manage this information, we would be severely handicapped in everyday life. About half of the capacity of the primate cortex is devoted to visual information processing.1 _{The human visual system is a skilful and powerful}

pat-tern seeker and recognizer. Visual perception and cognition are interrelated - just think of the phrase ”I see”, often synonymous to ”I understand”.

2.1 Definition of visualization

Data visualization is the process whereby non-visual information is transformed to visual information. By taking better advantage of the human cognitive capabilities, the process of data exploration and analysis is enhanced, making it possible to handle the huge amounts of data available today. Data visualization can be defined as follows:

“Visualization is the process of exploring, transforming, and viewing data as images (or other sensory forms) to gain understanding and insight into the data.”

Shroeder, Martin and Lorensen [25] Data visualization is partly based on techniques developed for image process-ing and computer graphics. The boundaries between the three research areas over-lap to some extent; the end product in a visualization is an image, rendered with computer graphics. Image processing methods can then be used to, for example, mark interesting areas in the image. According to Schroeder et al, what separates visualization from computer graphics and image processing, making it unique, is;

• The dimensionality of data is three dimensions or greater. Many well-known methods are available for data of two dimensions or less; visualization serves best when applied to data of higher dimensions.

1

(17)

• Visualization concerns itself with data transformation. That is, information is repeatedly created and modified to enhance the meaning of the data. • Visualization is naturally interactive, including the human directly in the

process of creating, transforming, and viewing data.

Thus, visualization is a transformation of data in one or several steps, where the end product is an image, and where the user has the possibility to control and ma-nipulate each step of the process. According to Mackinlay [14], data visualization must be;

• Expressive.

In an expressive visualization, all available data is represented and no new data has been introduced.

• Effective.

An effective visualization helps the user understand the information pre-sented. The effectiveness depends not only on the actual visualization, but also on external factors, like the users ability to interpret information. By fa-cilitating interactivity; the user can adjust the visualization according to his or her ability and needs and the effectiveness increases.

Visualization is not a new phenomenon; even some of the earliest rock carvings can be considered as such, but more sophisticated visualizations were not seen until the 19th century. A few historical milestones in the visualization of information with a geographic component are Snow’s cholera map and Minard’s visualization of Napoleon’s army.

Figure 2.1: Snow’s map, displaying a cholera outbreak

The cholera map (see figure 2.1), created by Dr. John Snow, displays the deaths from cholera in relation to the locations of public water pumps. It showed that the

(18)

2. Data visualization

Figure 2.2: Minard’s visualization of Napoleon’s march

deaths were concentrated around a particular pump, which was found contami-nated. Another famous example is Charles Minard’s visualization of the progress of Napoleon’s 1812 Russian campaign, shown in figure 2.2. It shows several at-tributes; the size of the army, its geographical location, direction of its movements, and temperature. The size of the army is proportional to the weight of the lines, while the colours indicate the direction. The position of the lines shows the geo-graphical location. There is also a temperature graph, showing the temperatures during the retreat from Moscow.

2.2 Information visualization

The introduction and advance of computers have naturally led to more advanced vi-sualization methods, such as modern infovis where interaction techniques are just as important as the visualization in it self. Infovis has a strong interdisciplinary base and connections to HCI2, information systems, GIS3, graphic design, cogni-tive psychology, and scientific visualization. Card, Mackinlay and Shneiderman [4] defines infovis as follows:

“Information Visualization: the use of computer-supported, interac-tive, visual representations of abstract data to amplify cognition.”

2_{HCI - Human-Computer Interaction} 3_{GIS - Geographic Information System}

(19)

Thus, infovis is based on abstract data, making it non-trivial to present on screen. Many research projects focus on finding techniques to present vast amounts of non-spatial and non-numerical data in an intuitive way. A concise description of the basic concept of many infovis systems is provided by the Scheinderman mantra [24] for information seeking:

”Overview first, zoom and filter, then details-on-demand”

It is important to first give the user an overview, and then let her interact with the visualization to filter the information or to view details. This way, new insights are gained, just like when John Snow found a relationship between cholera deaths and contaminated water pumps. Bederson and Shneiderman [23] refer to infovis tools as telescopes and microscopes, which can lead to new relationship discover-ies. A guiding star in infovis is the interactivity, facilitating process of understand and explore the data, in order to find relationships, patterns, clusters, outliers and gaps.

(20)

Chapter 3 Tools and technologies

When deciding which tools and technologies to use for the GeoWizard project, several criteria were considered;

Rapid prototyping: It should be possible to develop a fully functional prototype

within the time frame of the project. Due to the vague requirements of the project, a development environment that supported an iterative approach to the programming was desirable. This way, different variations of the ap-plication could be tried out without the need to start from scratch in each iteration.

High-level focus: Easy and productive GUI building should be supported, so the

work could be focused on solving the problem at hand, instead of concentrat-ing on programmconcentrat-ing basic functional code, like low-level graphics routines and GUI functions.

Support for available data-formats: The input data should require little or no

modifications to be used.

Learning curve: The amount of time spent on learning the various tools should

be as short as possible to enable project completion within the time frame. Based on these requirements, Microsoft Visual Studio .NET and the visual-ization SDK1 OpenViz Particles by AVS2 were considered the best choices. The OpenViz SDK contains a large library of efficient and robust components. An al-ternative would be to develop custom components from scratch with APIs3 like OpenGL, but that would move the focus from high-level problem solving to low-level graphics development.

1

SDK - Software Development Kit 2

AVS - Advanced Visual Systems 3_{API - Application Programming Interface}

(21)

3.1 Microsoft .NET

Microsoft .NET is Microsoft’s development and product strategy for the Internet Age. The large scope of the strategy makes it hard to describe it concisely; ac-cording to Microsoft[16], however, the main objective is to connect information, systems and devices so people can collaborate and communicate more efficiently. This strategy manifests in different forms based on the specific user domain. The domains can be roughly categorized as:

Clients: Windows XP and Windows CE provide regular users with a familiar user

interface to a number of .NET-influenced applications, such as MSN Mes-senger and Internet Explorer.

Servers: Microsoft Windows Server 2003, Microsoft SQL Server, and Microsoft

BizTalk Server integrates, runs, operates, and manages Web services and Web-based applications. Web services are small, reusable applications that help computers from many different operating system platforms work to-gether by exchanging messages through standard Internet protocols.

Developer tools: Visual Studio .NET and the .NET Framework SDK provide

de-velopers with powerful tools to write .NET compliant applications. This is the part of the .NET strategy that relates to the GeoWizard application so it will be described in more detail in the following section.

3.1.1 .NET Framework

The .NET Framework[18] is the development and execution environment used for .NET applications. It is designed according to the component-oriented program-ming paradigm and is intended to replace Microsoft’s former component architec-ture, the COM4 used in Visual Studio 6 and other earlier Microsoft products. A component is a reusable object encapsulating a set of related functions. Compo-nents can be deployed independently and interact with other compoCompo-nents or appli-cations through a strict interface defining its functions.

The main benefits of using the .NET Framework are;

Standard Internet protocols and specifications: TCP/IP5, SOAP6, XML7and -HTTP8are used to facilitate connections between a broad range of informa-tion, system, and devices.

Language independence: The framework is language neutral and currently

sup-ports over 20 different programming languages. Among them are C++,

4

COM - Component Object Model

5_{TCP/IP - Transmission Control Protocol/Internet Protocol} 6

SOAP - Simple Object Access Protocol 7

XML - eXtensible Markup Language 8

(22)

3. Tools and technologies

Jscript (the Microsoft version of JavaScript) and J# (the Microsoft variant of Java). The main languages used in most .NET projects, however, are C#, an object-oriented language similar to Java and C++, and Visual Basic .NET, an improved object-oriented variant of the Visual Basic language. Software development that supports many languages becomes more effective; each part of the whole system can be implemented in the language most suitable, or most familiar to the specific programmer.

Rapid application development: Due to a consistent programming model for

us-ing pre-packaged class libraries, application development is faster and easier. The framework base classes (see section 3.1.1) manages system resources, freeing the developer from much of the ”plumbing” (memory management, thread handling etc) required for software development.

Platform independent: The .NET Framework is available for a variety of

Win-dows platforms. The technology has been ported to other platforms in projects such as Mono [17], which aims to provide the .NET Framework for Linux-based platforms.

The .NET Framework consists of three main parts; the CLS9, the FCL10 and the CLR11, illustrated in figure 3.1.

The Common Language Specification (CLS)

The CLS is a set of rules and requirements that all .NET languages must follow. The specification ensures language interoperability by defining a set of features that all .NET languages support. It can also be used by developers of class libraries to determine whether their classes are CLS-compliant. Compliance requires the library to use only CLS features in the API it exposes to other code. The benefit of ensuring CLS-compliance is that the library is guaranteed to be accessible from any other .NET programming language.

The Framework Class Libraries (FCL)

The framework contains an extensive object-oriented class library providing devel-opers with common software functionality. The main parts of the library, shown in figure 3.1, are;

• Base classes. A large part of the library that handles data types, thread man-agement, low-level communication, security manman-agement, and much more. • Data management. Supports data management and manipulation via XML

and ADO.NET.

9

CLS - Common Language Specification 10

FCL - Framework Class Libraries 11

(23)

Figure 3.1: The .NET Framework architecture.

• Windows Forms. Facilitates the development of Windows GUI desktop-based applications.

• ASP.NET classes. Supports the development of Web-based applications (also known as Web forms) and Web services.

By providing pre-packaged functionality, easily accessible within the .NET de-velopment environment, developers can be more efficient by spending less time on common operations.

The Common Language Runtime (CLR)

The CLR is the execution engine for .NET Framework applications. It is a plat-form specific implementation, responsible for translating the platplat-form independent .NET code to native code. The concept is similar to the JVM12. Different JVM’s for each platform, for example, one for Linux and another for Windows, transforms the same Java code to different types of CPU13-specific native code. In the .NET case, though, the situation is a bit more complex due to the amount of different languages supported. The solution is to compile the source code in all languages

12_{JVM - Java Virtual Machine} 13

(24)

Figure 3.2: The .NET execution process. The code is compiled in two steps; first to MSIL code and then to native code.

into a common set of CPU-independent instructions, the MSIL14, which is then translated to native code by the CLR. The execution process is shown in figure 3.2. The code executed by the CLR must be managed. Managed code in this context means that the MSIL code must provide a minimum level of information, or meta-data, to the runtime engine in order to utilize services such as exception handling, memory management, cross-language integration, and code access security. Most standard .NET languages are managed by default, one exception being Visual C++. Unmanaged code runs outside the CLR. COM components, ActiveX components, and Win32 API functions are examples of unmanaged code.

3.1.2 Visual Studio .NET

Visual Studio .NET is Microsoft’s IDE15 for the .NET Framework. An IDE is a computer application, designed to support programmers in developing software. It usually includes a source code editor, a visual GUI builder, a compiler and a de-bugger. Modern IDEs such as Visual Studio .NET also contains a class browser, an object inspector and a class hierarchy diagram to facilitate object oriented de-velopment. As the primary IDE for .NET based projects, Visual Studio .NET has a large amount of functions and features. Describing these is beyond the scope of this report; so only those relevant for the GeoWizard project will be described in more detail.

The solution explorer

The solution explorer is a hierarchical tree view, similar to the regular Windows Explorer interface, showing the contents of conceptual containers called solutions and projects. A solution contains one or more projects and metadata related to the solution. A project contains source files, references, and metadata. Source files are the files containing all the code. In C#, source files have the extension .cs. References are a collection of components used by the project, either external (such as skybound.visualStyles in figure 3.3-a) or internal (that is, the output of another project in the solution, such as GSViz in figure 3.3-a). Solutions are programming

14

MSIL - Microsoft Intermediate Language 15

(25)

Figure 3.3: The Visual Studio .NET GUI, showing: the solution editor (a), the source code editor (b), the Windows Forms designer (c), and the toolbox(d).

language independent and designed to manage a set of related projects, where each project can be in a different .NET language.

The source code editor

The source code editor (figure 3.3-b) has all the regular features of a good code editor such as syntax highlighting, indentation of code blocks and context sensi-tive help. IntelliSense, a Microsoft technology designed to increase the efficiency and reduce the error rate of programming, provide dynamic completion of class, method and variable names. For example, typing ”dou” with IntelliSense enabled, the editor presents the user with a list of appropriate choices, in this case probably a type definition for the data type double. Another useful feature is the collapsible code blocks, where a region specified by the user can be collapsed into a single line. The collapsible region can encapsulate any part of the code, making it possible to hide methods or even entire classes.

(26)

Figure 3.4: Layout hierarchy in the Windows Forms Designer. The coordinate system of each control is based on the parent control, illustrated in the ViewerLeft-control

Windows Forms designer

Windows Forms Designer (WFD) is the visual GUI builder of Visual Studio.NET. Based on the Windows Forms class library of the .NET Framework, it provides a WYSIWYG16 environment for the creation of highly interactive Windows appli-cations (see figure 3.3-c).

A GUI built with WFD consists of forms and controls. Forms are the container components used to present information to, and receive input from, the user. They can be standard Windows (as in GeoWizard), MDI17 windows, dialog boxes or display surfaces for graphics. Embedded in the form are controls, the visual com-ponents representing a specific interface element. Controls can be simple, such as a standard button with functionality for handling click events, or advanced, such as a data grid with functionality for sorting and data binding. Custom controls can easily be developed and integrated, due to the component oriented architecture of the Windows Forms class library and the .NET Framework as a whole.

The controls on a form are arranged in a hierarchy with the form component at the top. The controls on each level of the hierarchy have a coordinate system based on its parent control, starting with (0,0) in the top-left corner and ending with (parent.width, parent.height). Basically, each control is only aware of its parent’s dimensions and therefore, size and location are relative to the parent edges (see figure 3.4).

GUIs are created with the WFD by dragging and dropping components from the Toolbox (see figure 3.3-d) on a form component. Properties for a specific

con-16_{WYSIWYG - What You See Is What You Get} 17

(27)

trol can be adjusted by clicking on it and changing the settings in the properties

view. Advanced layout management are provided by the anchor and dock

prop-erties, implemented by all control components. Anchoring a control ensures that the anchored edges remain in the same position relative to the edges of the parent container at all times. The dock property is related to the anchor property and is used to specify that a control should dock to an edge of its container. Docking the control to all edges at the same time, a control can be made to fill the entire area of its parent container. By intelligently setting the anchor and dock properties of the controls, a dynamic form can easily be created. The size and position of the child controls are resized in accordance with the form.

3.2 OpenViz Particles [19]

OpenViz Particles is a data visualization engine and development platform by AVS. It is composed of fine grain Java and COM components that can be combined in any number of ways to create powerful visualization applications. By building the component-based architecture on COM and Java technology, the resulting visual-izations can be deployed on a wide array of client platforms.

Because the OpenViz components are implemented using COM on Windows platforms, they cannot be used directly in .NET based applications but must employ a bridging technology called “.NET/COM Interoperability”. The basic concept of this technology is the creation of a wrapper class, usually called RCW18, for each COM component. The wrapper translates the interfaces exposed by the COM com-ponent into .NET compatible interfaces. In essence the .NET application accesses the functionality provided by the OpenViz components through the RCWs, not the COM components themselves. While this process might seem very complicated, the RCWs are fortunately automatically generated by Visual Studio .NET.

3.2.1 The visualization pipeline

The visualization pipeline is a conceptual model for representing the data flow in a visualization application. It is often used in visualization systems as a high-level abstraction of the steps performed to create a specific visualization. In OpenViz, the pipeline consists of three main steps; data access, data analysis, and visualiza-tion.

3.2.2 Data access

The first step of the visualization pipeline is to access the data and convert it to the internal data format used by OpenViz. Data access is provided by reader compo-nents that read data from a particular format, and mapper compocompo-nents that process

(28)

regular data arrays or database formats such as ADODB19or JDBC20. For exam-ple, the map in GeoWizard is accessed with the VzReadShapefile component, a specialized reader for shape files (see section 3.3), while the data attributes are ac-cessed with the VzTableMapper component, a general mapper for tabular data such as Excel spreadsheets.

The internal data model used to store and process data by OpenViz is called

Field. The field data model has been developed and continually refined by AVS

over the past fifteen years, making it a highly effective and stable technology. It has been specifically designed for flexibility and scalability and can handle a large variety of input data sources, such as database tables, XML data and map files.

From a developer’s perspective, the field acts as a data encapsulation object, containing data values as well as information of where specific points are located, how they relate to other points, and how the data values should affect the appear-ance of the visualization. The field can be assigned as input to any OpenViz ponent requiring input data, for example, data manipulation or visualizations com-ponents, or generated from components that creates or transforms data in some way. A single field is composed of a set of nodes, where each node can represent, for example, an index into an array or a coordinate location. Each node can have one or more data items connected to it. In GeoWizard, for example, each region is a node in the field and each attribute, for instance, tax rate, is a data item associated with the specific node.

The benefits of having a single data model representing every form of data in OpenViz is that it facilitates the fusion of data from multiple sources, often in different formats, within a single OpenViz view. For example, a developer can load one set of data values from an XML-file and another set from a database and visualize and transform this data as if it where one single data set.

3.2.3 Data analysis and processing

When the raw data has been converted into an OpenViz field, it is often necessary to reduce or transform the data set to bring out the relevant information. While this can be done before the data is converted into an OpenViz field, it can be useful to import all the available data and use the data analysis components of OpenViz to allow the user to interactively manipulate the data. With the analysis components, the data in a field can be reduced, sorted, aggregated or operated on with a user specified mathematical equation.

3.2.4 Visualization

In the final step of the pipeline, the processed data field is connected to a visu-alization component. OpenViz contains a large set of visuvisu-alization components, representing different techniques. Among them are basic chart types, such as bar

19

ADODB - ActiveX Data Object DataBase 20

(29)

chart and scatter chart, and more complex techniques, such as parallel coordinates and iso-surfaces. The visualization components are connected to a viewer com-ponent that handles the graphic objects generated as a result of the pipeline. Any number of visualization components can be combined and added to the viewer to create complex multi-view visualizations of the same data.

The visualization components connected to the viewer constitute the scene tree. The scene tree is a hierarchical collection of nodes called scene nodes, each node representing a specific visualization component. In addition to the visualization components, a number of other components can be nodes in the scene tree. Among these are components for dividing the viewer in to separate containers and for the presentation of legends describing, for example, a colour mapping.

3.3 Shape files [7]

Shape is a database format from ESRI21, which holds the shape, the spatial loca-tion, and attribute information of geographic features in a data set. The shape is described by a set of vector coordinates.

A shape file actually consists of three similarly named files; a main file, an index file and a database table. Each file contains a header list and some sort of data list.

Main file (.shp) contains the geography information. Each record describes a

shape by setting the type, for example, point, line or polygon, and a list of vertices.

Index file (.shx) is used to connect the main file and the dBase table. It contains

the record offsets of the corresponding main file record.

dBase file (.dbf) holds the attributes with one record per feature. There is a

one-to-one relationship with the attribute record and the associated shape record. The geometry stored in a shape file is non-topological, which can be limiting, but this also means faster drawing speed and less disk space than other similar data sources.

21

(30)

Chapter 4 GeoWizard

Using the tools and technologies described in the previous section, a specialized application for interactive exploration of geographically referenced statistical data, called GeoWizard, was developed. The application is Windows based and uses data from Statistics Sweden. The programming has been done in C# as it provides a good balance between performance and ease-of-use. Its Java like syntax differen-tiates it from VB.NET’s more unorthodox programming syntax, and was the main reason it was chosen for this project.

When developing any application it is important to consider the needs and char-acteristics of the intended users. This being a diploma work intended to investi-gate the possibilities of geospatial visualization, a specific target user did not exist. However, to guide the development, a hypothetical target user was specified as a moderately advanced Windows user with a small amount of domain knowledge, that is, the ability to interpret the visualized statistics.

The following requirements were set for the application:

Usable: The application should be easy to use and adhere to established user

in-terface principles. The GUI should feel intuitive and preferably familiar to the intended target user group.

Clean: Unnecessary screen clutter should be avoided. Ideally, every object on the

screen should either provide information to the user or perform a specific function, or both.

Exploratory: The application should facilitate exploration of multi-dimensional

data sets, and provide insight into the data that would be hard or impossible to achieve otherwise.

Distributable: The application should be easy to distribute. With minimal

ef-fort it should be possible to install and run the application on any standard Windows system. If possible, the application should be small enough to be distributed over the Web, for example, as a downloadable file on a website or an e-mail attachment.

(31)

Responsive: The application should give a responsive feeling, with user

interac-tions resulting in direct feedback from the system. The multiple views should be tightly linked, with changes in one view instantly reflected in all other views.

Flexible: Although the application has been created with a specific use in mind;

the analysis of social science data from Statistics Sweden, it should be easy to modify it to show data for another country by just changing the map and data files.

These requirements were used as guiding principles in the development and influenced decisions the design of the GUI, the choice of visualization techniques and the system architecture.

4.1 Graphical User Interface

The GUI of GeoWizard has the standard look-and-feel of a regular Windows ap-plication. This is a beneficial side effect of using the .NET Framework and Vi-sual Studio for the development. The application starts in a maximized state, to make full use of the available screen space, but the user can dynamically resize the window to suit her specific needs. A standard main menu contains functions for closing the application, opening another data set and accessing the embedded help files. The current data set is indicated by the file path in the status bar (see figure 4.1).

4.1.1 Task-based tabs

Early on in the design process of the GUI, it was decided that, to provide the opti-mal set of views for a specific problem or scenario, a task-based approach should be utilized. The general idea behind this approach is that one set of views supports one task, for example, high level exploration of the data, while another set of views supports another task, for example, analysis of the temporal characteristics. The benefit being that instead of trying to force everything into the same set of views, probably resulting in a cluttered and less intuitive interface, the customized views can be fine-tuned for each task.

To support the task-based approach, the GUI contains a tab control where each tab represents a specific task. Tabs can easily be added and removed, making the GUI flexible and scalable. For example, should it be decided later on that a view showing a group of maps arranged in a matrix would be beneficial for analyzing a geospatial attribute over a period of time, it could easily be added to the interface without changing the current implementation.

In the current version of GeoWizard, only the tab for exploration of the data set has been implemented. This tab is divided into four linked views separated by interactive splitters allowing the user to change the layout to her preference.

(32)

4. GeoWizard

Figure 4.1: The GeoWizard GUI contains (a) a choropleth map view, (b) a parallel coordinates browser, (c) a scatter plot matrix view, (d) and a scatter plot view. The currently selected data set is shown in the bottom left corner and a standard main menu at the top provide basic functions such as opening a new data set and accessing the help files.

More features: 1) Dynamic splitters allows the user to change the layout to her preference. 2) Range sliders, connected to each axis facilitate dynamic queries. 3) Context-sensitive pop-up menus provide functionality for controlling all views. 4) The map toolbar contains tools for manipulating the map view.

4.1.2 Multiple coordinated dynamic views

Multiple views is a powerful method to provide several different perspectives on the same data. Different views can show correlations and disparities between at-tributes in the data set that would be hard or impossible to discover in a single view system. In a multiple view system, a user can visually compare different views of the same data, whereas in a single view system, the user would need to memorize one view of the data and then mentally compare it to another view. By allowing users to recognize patterns instead of recalling information, the perceptual capabil-ities are leveraged.

(33)

fur-ther improve the value of an infovis system. In a coordinated view system, the multiple views of the same data are tightly coupled to enable users to rapidly inves-tigate the data from different perspectives. The user can analyze multi-dimensional data and derive a deeper understanding of compound properties through the data correlation in a coordinated view system.

A common approach in multiple view systems is to display each view in a sep-arate child window of the main GUI and allow the user to arbitrarily arrange the windows. The approach used in GeoWizard is to instead use a single coherent GUI window and link the different views to each other so that resizing one view dynam-ically resizes all the other views. This approach maximizes the use of available screen area but can require lots of programming to ensure synchronization of sizes between the views. Fortunately, Visual Studio provides an easy way to manage the layout of the views with the anchor and dock properties described in section 3.1.2 on page 15. To recap; anchor specifies how the location of a component should relate to its parent’s edges, while dock specifies if a component should dock to any of its parent’s edges. Using these properties combined with splitter components (see figure 4.1-1) the views can be dynamically resizeable.

To coordinate data between views in GeoWizard, a two-pronged approach is taken. Whenever possible, the field data-linking method employed by OpenViz is used. In this method, the components use the same input field and any changes made to the input field propagate to the visualization components. When this method cannot be used, due to limitations in OpenViz or the system architecture of GeoWizard, a low-level method is instead utilized. In this method, arrays of indices are sent between the visualization components. The indices are essentially ”keys” for the regions and are based on the row number in the original Excel file. For ex-ample, the region ”Upplands V¨asby” is represented by the first row and therefore has index 0. A specific region has the same index in all views, which can be used to coordinate the data. The field-linking method is used to synchronize the colour map, while the index array method is used for the dynamic queries described in section 4.3 on page 25.

4.1.3 Context-sensitive menus

To reduce screen clutter, while still providing an easy way to interact with the application, most of the functions have been hidden in context-sensitive pop-up menus. Clicking on the right mouse button in the map view brings up a menu with functions related to the map, while a right click on the axis labels in the PCB1 brings up another menu. The use of context menus associated with a right button click is consistent with the standard behaviour of Windows applications and should be familiar to most users.

1

(34)

4. GeoWizard

4.2 Visualization techniques

Following the task-based approach, a customized set of visualizations for the data exploration task has been implemented. Selection, filtering and colour mapping is coordinated in all visualizations. The colour scheme used for the colour mapping follows the guidelines presented on the ColorBrewer website [2]. It is divided into two classes, each with a continuous tone colour spectrum, making it both sequential and diverging. Overview is provided by the parallel coordinates and the scatter plot matrix, while the choropleth map and the 2D scatter provide more detailed views of the data. Scene trees for the visualizations can be found in appendix A and a view of the GUI is found on page 21 or as a sequence in section 5.

4.2.1 Parallel Coordinates Browser

Figure 4.2: A column (attribute) in the Excel spreadsheet is assigned to a PCB axis. The axis label font defines the state of the attribute:

normal = not active, bold = used in colour map, italic = used in 2D scatter.

Parallel coordinates is a technique pioneered already in the 1980’s [12] and has since been used in many multiple view geovis environments. The regions in the data set are represented as a series of unbroken line segments, or polylines, passing through parallel axes, each representing a single attribute. A PCB and its connection to a Excel spreadsheet is illustrated in figure 4.2. The value of an attribute for a specific region is defined by the polyline’s intersection with the axis. The ends of the attribute axis represent the maximum and minimum values for all regions. A single polyline forms a visual representation of the characteristics of one municipality. Differences between selected municipalities can easily be spotted by visually comparing the polylines representing them. The number of attributes that can be visualized is restricted only by the horizontal resolution of the containing window. The common problem of high line density, whereby single lines can be hard to distinguish, is partly reduced due to the limitation of 290 municipalities.

(35)

The distinguishing feature of the GeoWizard PCB implementation is its ability to serve as a control panel for easier identification of multivariate relationships across spatial domains in the map. The PCB provides functionality for dynamic queries and for controlling the visualizations in the other views. This functionality is accessible via embedded controls; range sliders for the dynamic queries and interactive axis labels for controlling visualizations.

The axis labels function as buttons with two actions, one for each mouse button. A click on the left mouse button changes the colour map to represent the related attribute while a click on the right button brings up a context sensitive menu (see figure 4.1-3 on page 21). With the menu, the user can change attributes for the colour map and scatter plot. To give the user an indication of which attributes are active the labels have three different states;

• when the attribute is used for the colour mapping the label text is set in bold type and the background colour is the same as in splitters and the map title bar,

• when the attribute is used in the scatter plot the label text is set in italic type, • when the attribute is not active the label is set in normal type.

4.2.2 Choropleth map

The map view contains a title bar, an overview map, a colour legend, a choropleth map and a toolbar. The title bar shows the name of the coloured attribute and the current filter status in absolute numbers and percentage. The background colour matches the active axis label in the PCB, providing a subtle visual connection be-tween the two views. The overview map is visible at all times, providing context when the user has zoomed in on a region in the map. The colour legend, while functioning as a regular legend providing guidance for interpreting the colours, can also be used to dynamically alter the colour mapping (described in more detail in the section 4.4.2). The choropleth map shows the geographic distribution of a specific attribute. In choropleth maps (also referred to as thematic maps), each re-gion is filled with a colour or pattern corresponding to a value, making it easy to discover geographic patterns in the data. The toolbar, seen in figure 4.1-4, contains several tools for manipulation of the map view; the common navigation functions zoom, rotate, pan and reset, as well as three mode buttons. The mode buttons con-trol paint, dynamic colour range and 3D mode. The paint function is an improved selection and the dynamic colour range is used to adjust the colour map. They are both presented in more detail in the next section, interaction techniques. If the 3D mode is on, the user can visualize an attribute as bars in the map, seen in figure 4.1.

4.2.3 Scatter plot matrix

The scatter plot matrix view (see figure 4.1-c) shows all possible two-dimensional scatter plots in the data set. The plots are arranged by varying the x-attribute for

(36)

4. GeoWizard

each row and the y-attribute for each column. The first scatter plot, at the top left, plots the first attribute in the data set on both the x- and y-axis, causing the straight line. The next one plots the first attribute on the x-axis, and the second attribute on the y-axis, and so on. A subtle grid separates the scatter plots and provides a frame of reference for each plot. To avoid unnecessary screen clutter, tooltips (see section 4.4.4) are used to show the x- and y-attributes for each plot. The matrix view is intended as a complement to the PCB in providing the user with an overview of the data set. Interesting correlations between two attributes can easily be discovered and studied in more detail in the scatter plot view. To indicate the currently selected plot, the background colour is slightly brighter and the surrounding border slightly darker than the other plots.

4.2.4 Scatter plot

While the scatter plot matrix shows two-dimensional correlations between all at-tributes, the detailed scatter plot view (see figure 4.1-d) visualize a third attribute by using the same colour scheme as the choropleth map and the PCB. Whereas the parallel coordinates and the scatter plot matrix require a certain degree of cognitive load on the user, scatter plots on the other hand are highly intuitive. Most users can quickly grasp the relationship between two attributes by studying the pattern created by the plot.

4.3 Dynamic queries

Dynamic queries is an approach to data exploration that provides users with a sim-ple way of exploring large data sets rapidly. A query is adjusted with a direct manipulation control, usually a slider, and the associated visualization is updated in real-time (<100 ms). Dynamic queries can be thought of as a way of, through user interface controls, directly query a data base without writing complex SQL2 statements.

In GeoWizard, the range sliders attached to each axis in the PCB are used to set a range for each attribute in the data set, allowing the user to dynamically formulate complex queries. Regions that fall outside the set of ranges are still visible, albeit with a significantly lower opacity. This way, context is maintained without the uninteresting regions being too visually distinct. The results from the dynamic queries are reflected in all views in real-time to preserve temporal continuity.

With the range sliders attached to each axis, the user can interactively formulate dynamic queries. Moving the handles at the top and bottom of the axis controls the range of a selected statistical attribute. AND operations are performed by combin-ing several range slides queries, for example, average income AND medium age, and corresponding subsets of the data is filtered out.

(37)

4.4 Interaction techniques

Interaction is an important aspect of infovis. The idea behind the interaction tech-niques in GeoWizard is that users interact directly with the on-screen visualization without dependence on more traditional GUI controls, like buttons, drop-down lists, etc. This allows the user to take a more active role in the process of visualiz-ing and investigatvisualiz-ing data. A data-centric model, in which the user responds to and interacts with the actual visual representation of the data, is used.

4.4.1 Brushing and highlighting

Brushing and highlighting provide the necessary bridge between the different views. These techniques assist the user in providing a geographical context to the data and ease the cognitive load on the user by explicitly linking objects in different views. Brushing over a line in the PCB or a glyph in the scatter plot displays a label show-ing the region name in the map view.

The user highlights a particular municipality with a mouse click. This feature isolates the focused region and gives the user a quick overview of all the attributes. The coordination is omni-directional, which means that an object selected in one view is highlighted with the same colour in all the other views. Multiple selection is implemented which allows the user to highlight a number of municipalities at the same time. The highlighted regions are coloured dark grey, thereby losing the information from the colour map.

To keep the colour map colours, and just focus on the selected regions, the user can go into paint mode. The paint tool functions as sort of an inverse filter; instead of filtering out a subset of the complete data set the paint tool starts with all regions filtered out and the user can then filter in specific regions by “painting”, or selecting them. This is illustrated in figure 5.6 in the case study section. The paint tool can be used to study characteristics of entire regions, for example, urban areas versus countryside.

4.4.2 Adjusting the colour map

The objects in all views are coloured based on a certain attribute, controlled by the user via the PCB. A colour map legend is attached to the map. The legend does not only function as guidance for interpreting the colours; it can also be used to adjust the colour map. It allows the user to change the threshold, or the class boundary, for the two separate colour spectrums. The map is dynamically re-drawn to reflect any changes. This way, it is easier to see variations in the data. Andrienko and Andrienko have described this concept as dynamic painting in [1].

If the dynamic colour range mode is enabled, the colour map is automatically updated according to the current minimum and maximum values of the attribute. The range extents, that is the maximum and minimum values, are controlled in the PCB using the range sliders. The dynamic colour range function is useful when

(38)

4. GeoWizard

filtering the data set to better see variations within the data. This is illustrated in figure 5.4 in section 5.

4.4.3 Zooming

Due to differences in the size of the regions in the map, a zoom and pan function has been implemented. Zooming is done by pressing the zoom button in the toolbar and then moving the mouse cursor forward to zoom in, or backward to zoom out. Panning works similarly but is enabled by the pan button instead. A fast zoom and pan to a specific area can be achieved by dragging a rectangle around the area of interest while the zoom mode is enabled. Using the zoom function, it is easy to focus on geographic subsets, such as the southern part of Sweden.

4.4.4 Tooltips

A tooltip is a small descriptive pop-up label that appears whenever the user hovers over an object for a short amount of time. It is used to save space; information that is not essential can be hidden and presented to the user on demand. The technique is related to brushing in that they are both triggered by the mouse cursor without the need to click on an object. Tooltips, however, have a slightly different use. A tooltip is associated with a specific control and provide descriptive text on that control whereas brushing shows the value of a specific object in the visualization.

4.5 System architecture

The general concept influencing the system architecture of GeoWizard is the comp-onent-oriented approach to software development. In a component-based system, a set of components, each encapsulating a specific functionality, are combined to cre-ate higher-level components. The object-oriented design of the .NET Framework and OpenViz class libraries described earlier naturally lends itself to this type of approach. The specific system architecture of GeoWizard can be described as a set of layers, where each layer builds upon the lower layers.

4.5.1 Layered component architecture

Atomic components are those very low-level, high performance, and typically

underlying data structure dependent components that constitute the core of an efficient application. A rich set of component resources with fine grain control allows precise matching to user requirements. Atomic components are used for developing functional components and scalable, customizable application (high-level) components. Atomic components from independent .NET sources, including data readers, GIS, computational analysis and clus-tering can easily be integrated.

(39)

Functional components are the middle tier components that are constituted by

the combination of one or more atomic components. These components typically implement the general functionality of high-profile visualizations, such as the PCB or the scatter plot matrix.

Application components are the components that are constituted by the

combi-nation of one or more functional components, and are user accessible appli-cations such as GeoWizard.

In the layered architecture, a set of low-level atomic and functional compo-nents, each one performing a specific task in the overall data exploration process, is put together into high-level application components (see figure 4.3). The GeoWiz-ard functional and application components are based on the atomic component-level .NET Framework and OpenViz class libraries. Components in the OpenViz library are prefixed by ”Vz” to distinguish them from regular .NET components.

Figure 4.3: Layered component-oriented system architecture. Each layer builds upon the underlying layers.

4.5.2 System design

The component architecture provides the basis for the system design, that is, how the functional and atomic components should be combined to create the applica-tion component. To separate the funcapplica-tional components from the GUI the system is divided into two general parts; the GSViz class library and the GUI. In the ter-minology used in Visual Studio, GeoWizard is a solution containing two projects. The motivation for this is that by separating form and function, reusability of the functional components can be ensured.

The functional components are organized into groups of related components by using what is called namespaces in .NET. A namespace is a hierarchical way

(40)

4. GeoWizard

of organizing class names. For example, the class library containing all the func-tional components in GeoWizard has the root namespace GSViz. The reason the namespace is GSViz and not GeoWizard, is that GSViz was the original working name for the project. Classes contained in GSViz have namespaces representing their general functionality; the classes handling data reading and storage, for ex-ample, are contained in the GSViz.Data namespace. All the namespaces defined in GeoWizard are:

GSViz.Data

ExcelReader: Extracts the data values from an Excel file via an OLEDB3 connec-tion. OLEDB is a data access component created by Microsoft, which provides access to a large number of different data source types, for example, Access data-bases and OLAP4 servers. By relying on the OLEDB component for the data source connection, flexibility regarding input data can be ensured. The Excel-Reader component can easily be modified to extract data from any other OLEDB data source by just changing a connection string.

VizData: Converts and stores the raw data from the ExcelReader component to an

OpenViz field. It also contains string arrays with all the region and data attribute names.

GSViz.Functions

Algebra: A utility component for basic algebra functions. Currently, only a

func-tion for transforming coordinates from one coordinate system to another is imple-mented. It is mostly used to transform points in screen coordinates, which are in pixel values, to OpenViz coordinates, which are usually between (-1,-1) to (1,1).

GSViz.Visualizations

IGSVisualization: An interface class, implemented by all the classes in the

visu-alization namespace. Interface classes can be thought of as template classes, de-scribing what characteristics a class should have, for example, what functions and variables should be implemented. A class that implements the interface agrees to implement all of the methods defined in the interface, thereby agreeing to certain behaviour. The reason for using interface classes is that it provides a higher level of abstraction that can be used to write more generalized methods. In GeoWizard, messages are sent between the visualization components to coordinate the views so that the data is in the same state in all views (for example, the same region is selected in all views). By using interfaces the coordination methods can be general-ized to handle IGSVisualization objects instead of having to handle all the different types of visualization components.

3_{OLEDB - Object Link Embedded DataBase} 4

Integrating geographic and information visualization for interactive exploration of statistical data

LITH-ITN-MT-EX--05/031--SE

Integrating geographic and

information visualization for

interactive exploration of

statistical data

Nina Feldt

Henrik Pettersson

LITH-ITN-MT-EX--05/031--SE

Integrating geographic and

information visualization for

interactive exploration of

statistical data

Examensarbete utfört i medieteknik

vid Linköpings Tekniska Högskola, Campus

Norrköping

Nina Feldt

Henrik Pettersson

Handledare Mikael Jern

Examinator Mikael Jern

Norrköping 2005-04-20

2005-04-20

LITH-ITN-MT-EX--05/031--SE

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

visualization for interactive exploration of statistical

data

Abstract

Contents

List of Figures

Chapter 1

Introduction

1.1

Background

1.2

Objectives

1.3

Methods

1.4

Target audience

1.5

Structure of report

Chapter 2

Data visualization

2.1

Definition of visualization

2.2