Statistical flow data applied to visual analytics

Full text

(1)LiU-ITN-TEK-A--11/051--SE. Statistical flow data applied to geovisual analytics Phong Hai Nguyen 2011-08-31. Department of Science and Technology Linköping University SE-601 74 Norrköping , Sw eden. Institutionen för teknik och naturvetenskap Linköpings universitet 601 74 Norrköping.

(2) LiU-ITN-TEK-A--11/051--SE. Statistical flow data applied to geovisual analytics Examensarbete utfört i medieteknik vid Tekniska högskolan vid Linköpings universitet. Phong Hai Nguyen Examinator Mikael Jern Norrköping 2011-08-31.

(3) Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/. © Phong Hai Nguyen.

(4) Abstract Statistical flow data such as commuting, migration, trade and money flows has gained many interests from policy makers, city planners, researchers and ordinary citizens as well. There have appeared numerous statistical data visualisations; however, there is a shortage of applications for visualising flow data. Moreover, among these rare applications, some are standalone and only for expert usages, some do not support interactive functionalities, and some can only provide an overview of data. Therefore, in this thesis, I develop a web-enabled, highly interactive and analysis support statistical flow data visualisation application that addresses all those challenges. My application is implemented based on GAV Flash, a powerful interactive visualisation component framework, thus it is inherently web-enabled with basic interactive features. The application uses visual analytics approach that combines both data analysis and interactive visualisation to solve cluttering issue, the problem of overlapping flows on the display. A variety of analysis means are provided to analyse flow data efficiently including analysing both flow directions simultaneously, visualising time-series flow data, finding most attracting regions and figuring out the reason behind derived patterns. The application also supports sharing knowledge between colleagues by providing story-telling mechanism which allows users to create and share their findings as a visualisation story. Last but not least, the application enables users to embed the visualisation based on the story into an ordinary web-page so that public stand a golden chance to derive an insight into officially statistical flow data.. 1.

(5)

(6) Acknowledgments First and foremost, I would like to express my deep gratitude to Professor Mikael Jern who approves me to do the thesis, suggests many great ideas and helps me to control the progress of the project. In the second place, I am indebted to PhD student Quan Ho. With his enthusiasm, his inspiration and his great efforts to explain GAV Flash framework, to teach me some tricky coding techniques and to discuss with me when I have difficulties, I am able to mature enough to complete my thesis successfully. I am grateful to Tobias Åström for helping me improve application interface and especially guiding me about vislet, one of the most important parts of my thesis. Last but not least, I would like to give my special thanks to my parents and my brother who love me and financially support me. I wish to thank my girlfriend Hien whose patience and love encouraged me to complete this work. To them I dedicate this thesis.. 3.

(7)

(8) Contents 1. Introduction 1.1 Motivation . . 1.2 Objectives . . 1.3 Methods . . . 1.4 Thesis Outline. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 2. Related Work. 3. Background 3.1 Visual Analytics . . . . . . . . . . . . . . 3.1.1 Definition . . . . . . . . . . . . . 3.1.2 Application . . . . . . . . . . . . 3.2 GAV Flash . . . . . . . . . . . . . . . . . 3.2.1 Hierarchical Architecture . . . . 3.2.2 Data Model . . . . . . . . . . . . 3.2.3 Performance . . . . . . . . . . . . 3.2.4 Interactive Features . . . . . . . . 3.2.5 Sharing Knowledge . . . . . . . 3.3 Model-View-Controller Design Pattern . 3.4 Flow Visualisation . . . . . . . . . . . . 3.5 Interpolation . . . . . . . . . . . . . . . .. 4. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 1 1 1 2 2 5. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. 9 9 9 9 10 10 10 11 11 11 12 13 14. Approach 4.1 Web-based Application with Real-time Interaction 4.2 Cluttering Issue . . . . . . . . . . . . . . . . . . . . 4.2.1 Flow Appearance Method . . . . . . . . . . 4.2.2 Flow Analysis Method . . . . . . . . . . . . 4.3 Extensive Analysis . . . . . . . . . . . . . . . . . . 4.3.1 Back-and-Forth Flows Analysis . . . . . . . 4.3.2 Time-series Flow Data Visualisation . . . . 4.3.3 Finding Most Attracting Regions . . . . . . 4.3.4 Reasoning Flow Patterns . . . . . . . . . . . 4.4 Sharing Knowledge . . . . . . . . . . . . . . . . . . 4.4.1 Snapshot . . . . . . . . . . . . . . . . . . . . 4.4.2 Vislet . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. 15 15 16 16 18 21 21 21 22 23 23 23 24. 5. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . ..

(9) 6. Contents 4.5. 5. 6. 7. Large Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Divide-and-Conquer Solution . . . . . . . . . . . . . . . . . .. Implementation 5.1 Application Architecture . . 5.1.1 Flow Map Model . . 5.1.2 Flow Map Views . . 5.1.3 Flow Map Controller 5.2 Vislet . . . . . . . . . . . . . 5.2.1 Snapshot . . . . . . . 5.2.2 vislet . . . . . . . . . 5.2.3 Flow Map Vislet . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 24 24 24 25 25 26 28 29 30 30 30 31. Results 6.1 Statistic Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Problem Solving Using the Application . . . . . . . . . . . . . . . . 6.2.1 Which Norwegian counties commute to Stockholm most? . 6.2.2 Compare (*) with Norwegian counties commuting to Västra Götaland most . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Compare flows in (*) with their inverse flows . . . . . . . . . 6.2.4 Compare flows in (*) with most noticeable commuting flows between two countries . . . . . . . . . . . . . . . . . . . . . . 6.2.5 Why are the flows towards to Norway so dominant? . . . . 6.2.6 Which counties attract people most? . . . . . . . . . . . . . . 6.2.7 What is the trend of people commuting between Oslo and Stockholm? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.8 Story-telling and Vislet . . . . . . . . . . . . . . . . . . . . . .. 33 33 33 33. Conclusion and Future Work 7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 41 41 41. Bibliography. 35 36 36 36 37 37 39. 43.

(10) Chapter 1. Introduction In this chapter, I will provide an overview picture of the thesis. First, Section 1.1 explains the importance of flow data and the need for a web-enabled, highly interactive and analysis support statistical flow data visualisation application. Next, Section 1.2 describes the goals for which my thesis aims. Section 1.3 introduces general ideas of the methods which will be used to fulfil the objectives. Finally, Section 1.4 explains the structure of this thesis report.. 1.1. Motivation. The data which represents a movement from an origin to a destination is defined as flow data. There are many examples of flow data including commuting, migration, trade and flows of money that have drawn much attention of policy makers, city planners, researchers and ordinary citizens as well. This kind of data is two-dimensional relationship data; thus, it is normally large. Data of commuting among Swedish municipalities may serve as a good example. Sweden has 290 municipalities, so there are 290 * 289 = 83810 possible commuting flows between each two of them. Reading a table of tens or hundreds of thousands records to find out valuable information is tedious and very difficult. Therefore, it calls for a tool to analyse that massive data and visualise the findings so that viewers can easily derive an insight into data. Unfortunately, even though there has appeared a great deal of statistical data visualisations, applications for visualising flow data are still rare. Moreover, these seldom applications also suffer the following drawbacks: (1) some are standalone and only for expert usages; (2) some do not support interactive functionalities which are essential for visual analytics; and (3) some can only provide an overview of flow data. In this thesis, my aim is to address all these three challenges by developing a web-enabled, highly interactive and analysis support statistical flow data visualisation application.. 1.2. Objectives. Specifically, in this thesis, I develop an application that fulfils the following objectives: (1) Enabling web-based application with real-time interaction.. 1.

(11) 2. Introduction (2) Solving cluttering issue. (3) Providing a variety of useful means to analyse flow data efficiently. (4) Sharing knowledge between colleagues. (5) Supporting large dataset.. 1.3. Methods. Thanks to GAV Flash, an interactive visualisation component framework providing a leverage on creating an interactive visualisation application [1], my application is inherently web-enabled and supports some basic interactive features such as selection, hovering, zooming, panning, filtering and linking. I need to expand these features for flows to make them interactive in the same manner as in GAV Flash. To solve cluttering issue, I use visual analytics approach to combine automated analysis methods and interactive visualisation as the visual analytics mantra described by Keim et al. [2] “Analyse First - Show the Important - Zoom, Filter and Analyse Further - Details on Demand”. Some useful flow analysis methods are proposed to be able to find both overview patterns and focus patterns while maintaining the clear visualisation of flows. Moreover, further efforts are made by tuning the appearance of flows to make the drawing less space-consuming. To make the application extensive, many analysis means will be supported including analysing both flow directions simultaneously, visualising time-series flow data, finding most attracting regions and reasoning the found patterns. Users have the need to save and share what they find when experiencing the visualisation, especially when they are exploring a complex data set through a number of analytical steps. Instead of simply capturing a screenshot, the application should be able to backup and restore the visualisation so that users can continue interacting with their findings. I implement the idea of story-telling which Jern describes in [3]. Moreover, by enabling vislet [3], a small visualisation created based on the visualisation story, to be embedded into an ordinary web-page as a HTML snippet code, users can create dynamic reports and presents knowledge to public in a much easier and more interesting way. For supporting large dataset, I divide the whole dataset into semantically smaller parts and let users decide which parts are active. Normally, highly used data are set to active by default in a configuration file; however, data can be re-loaded on-demand.. 1.4. Thesis Outline. The thesis is organized as follows. Chapter 1 provides an overview picture of the thesis. The chapter explains the importance of flow data and the need for a web-enabled, highly interactive and analysis support statistical flow data visualisation application. It also describes the goals for which the thesis aims and introduces general ideas of the methods which will be used to fulfil the objectives. Chapter 2 reviews the strengths and the weaknesses of main methods that are used for solving overlapping problem including applying bundling techniques to merge flow edges, grouping regions to form new natural communities, using line density raster to colour the map and applying random sampling or dynamic filtering to constrain the number of displayed flows. The chapter also points out important unsolved problems that will be addressed in the thesis..

(12) 1.4 Thesis Outline. 3. Chapter 3 presents basic knowledge that helps readers to understand the rest of the thesis including visual analytics, GAV Flash framework, Model-View-Controller design pattern, mathematics of quadratic Bézier curves and linear interpolation technique. Chapter 4 discusses in depth the solutions to five objectives of the application including how to make the application web-enabled with real-time interactive features; how to solve cluttering issue using flow appearance method and more importantly, flow analysis method; how to provide a variety of analysis means such as back-and-forth flows analysis, time-series flow data visualisation, finding most interesting regions and reasoning flow patterns; how to solve sharing knowledge problem through snapshot mechanism with story-telling and vislet; and finally how to handle large dataset using divide-and-conquer method. Chapter 5 describes the main and challenging points of the application implementation. The chapter introduces the architecture of the application and provides the implementation idea of three components: Flow Map Model, Flow Map Views and Flow Map Controller of Model-View-Controller design pattern that the application uses. It also discusses in depth the implementation of snapshot and vislet for sharing knowledge objective of the application. Chapter 6 presents most of the important and interesting features of my thesis through problem - solution mechanism. A statistic problem is proposed and then solved by the application. Chapter 7 briefly summarises the most significant results of the thesis through successfully achieve all proposed objectives. The chapter also mentions possible future work to improve the application such as building application wizard, data conversion wizard and 3D arrows..

(13)

(14) Chapter 2. Related Work So far, there has been much research in flow data visualisation. These studies try to solve one of the most critical problems in flow mapping, cluttering. In this chapter, I will review the strengths and weaknesses of main methods that are used for solving this overlapping problem including bundling techniques to merge flow edges, grouping regions to form new natural communities, using line density raster to colour the map and applying random sampling or dynamic filtering to constrain the number of displayed flows. Finally, I point out important unsolved problems that will be addressed in my thesis.. Related Research Review In order to visualise flows, or spatial interactions, one common and natural approach, flow map, is to use straight or curved line to connect the origin and the destination of the flow and to employ the line width and/or colour to represent the flow volume. Moreover, Tobler [4] utilises two inversely parallel half-arrows to indicate reciprocal flows (Figure 2.1 - Left). However, when there are many flows on the display, the visualisations will overlap themselves. A traditional flow map that shows all the flows will confront with this cluttering issue even for small datasets; for example, net migration flows among 48 US states as show in Figure 2.1 - Right. To reduce clutter in flow maps, one approach is to group edges into bundles. Phan et al. [5] use layout algorithms that minimize edge crossings and try to maintain the relative distances between nodes. Recently, Cui et al. [6] propose a method to merge edges using a control mesh to guide the edge-clustering process. Moreover, the method allows interacting with the results including colour and opacity enhancement, mesh adjustment and merging process animation. Figure 2.2 compares two examples of Phan’s work and Cui’s work. This approach gives a very good overview of flows; however, it loses the focus. Besides, bundling makes it difficult to perceive the connection between the origin and the destination of the flow. When there is any tiny change of displayed flows; for example, adding or removing a flow, the result needs to be recalculated and rendered again. The location and layout of new flows may be different and this change makes viewers confused. Additionally, it is very challenging to visualise both directions at the same time with this bundling technique. Another attempt to reduce clutter is made by Guo [7]. Instead of grouping flows,. 5.

(15) 6. Related Work. Reciprocal flows visualisation. An example of cluttered visualisation. Figure 2.1: Tobler’s Flow Map [4].. Phan et al.’s method. Cui et al.’s method. Figure 2.2: Grouping edges into bundles technique to reduce cluttering [5, 6].. he merges regions. Guo constructs new regions by grouping smaller parts under some particular constraints such as size or population. As a result, new natural communities are found and the number of flows are significantly reduced. Guo also argues that important trends can be revealed in these new bigger regions. Another strength of Guo’s work is the capability of visualising sub-level flow data. For example, in migration data, there are 100 migrants moving from A to B. Among these 100 people, there are 45 women and 55 men, 30 people having income higher than 40000 USD and 70 people having income less than 40000 USD. Guo uses a Self-Organizing Map to map multivariate data (data by sex and income as in the example) to a 2D map by moving regions that have similar properties next to each other. The colour returning by the SOM is used for colouring the map. Moreover, multivariate data can also be analysed by a Parallel Coordinates Plot. Figure 2.3 illustrates this great work. The third approach is to create a line density raster by dividing the map into a grid and computing cell density based on the number of elements going through it [8]. This method.

(16) 7. Figure 2.3: Multivariate flow mapping at the regional level. A self-organizing map (bottom left), parallel coordinate plot (bottom right), and a flow map (top right) are coordinated to present flow structure, multivariate information, and spatial patterns at the same time [7].. provides a nice picture and a good overview but as the bundling technique, it loses the focus as well as the correspondence between the origin and the destination. Figure 2.4 shows a colourful result of UK migration. People tend to move to Liverpool, Manchester, Birmingham and especially London where are represented by red colour. The fourth and simplest approach is to use random sampling [9] or dynamic filtering [8] to show a portion of flows. However, this method loses the overview and may destroy important patterns.. Conclusion Although there has been much research in flow data visualisation and especially in solving cluttering problem, many important issues are left. The existing research results are standalone applications and for only experts. Most of them do not focus on interactive functionalities such as selection, zooming, panning, filtering, and linking except Guo [7]. Some of works are time-consuming because of intensive-computing algorithms and only provide the overview picture of flow data. There is no research using visual analytics approach that is used in my thesis to focus on both data analysis and interactive visualisation..

(17) 8. Related Work. Figure 2.4: UK migration 2000 - 2001 as a line density raster. People tend to move to yellow and red cities like Liverpool, Manchester, Birmingham and London [8]..

(18) Chapter 3. Background In this chapter, I will present basic knowledge that helps readers to understand the rest of my thesis. Section 3.1 describes the definition and applications of visual analytics, the approach that I use in the application. Next, Section 3.2 briefly summarises GAV Flash framework including its hierarchical architecture, data model, performance, interactive features and sharing knowledge methods. The implementation of my thesis is based on GAV Flash. Section 3.3 introduces Model-View-Controller design pattern and its advantages in software engineering. Model-View-Controller is the basic theme of the implementation. Section 3.4 describes the mathematics of quadratic Bézier curve that is used for drawing flows. Finally, Section 3.5 provides understanding of interpolation technique to compute missing values in making animation of data.. 3.1 3.1.1. Visual Analytics Definition. According to Thomas and Cook [10], Visual Analytics is “the science of analytical reasoning facilitated by interactive visual interfaces”. Visual analytics is more than just visualisation, visual analytics is a multidisciplinary research field including visualisation, humancomputer interaction, data analysis and data mining. Nowadays, the increasing of data is higher than ever and this enormous data makes visualisation more difficult. With large dataset, it is very challenging to create a good overview without losing any important interesting information. Keim et al. [2] propose a mantra for visual analytics “Analyse First - Show the Important - Zoom, Filter and Analyse Further - Details on Demand”. Large dataset needs aggressively automatic data mining methods to analyse it first to extract meaningful information and then visualise the important findings.. 3.1.2. Application. The application areas of visual analytics are very large. Visual analytics is a suitable solution for any applications which have to process and analyse large data. In monitoring climate and weather, huge amounts of data are continuously obtained from sensors all over the world. Visual approach helps to gain insight into this large. 9.

(19) 10. Background. dataset; for example, the relationship between climate factors and its change. Moreover, interactive visualisation in weather forecast news will attract public and this attractiveness helps people to remember forecast news easily. Other applications in the same field such as global warming, hurricane and tsunami warnings can also get benefits from visual analytics. In the application area of bio-informatics, researchers use visual analytics to understand large amounts of biological data. There are many emerging research fields including studying the proteins in a cell, studying metabolism that are subjects for application of visual analytics. Visual analytics is also used for analysing text data. Online social network is a huge data source; for instance, there are 200 million tweets per day in Twitter. By analysing and visualising this data on the world map, we can understand what really happens in real-time all over the world.. 3.2. GAV Flash. GAV Flash [1] is an interactive visualisation component framework designed to facilitate the development of customized web-enabled visual analytics applications. The framework is implemented in Adobe’s ActionScript and follows object-oriented programming guidelines such as encapsulation, inheritance and especially modularity by extensive use of interfaces. A module interface describes essential elements that are required by the module (inputs) to accomplish given tasks and/or results that the module needs to provide (outputs); however, how to process the inputs are left in another class that implements the interface. Modules communicate through their interfaces and thus, this mechanism makes the framework extendible and maintainable.. 3.2.1. Hierarchical Architecture. There are three component levels in the framework: application level, functional component level and atomic component level. Atomic components are independent components and are usually combined together into a higher level, functional component. Besides, one atomic component can belong to one or more functional components. Successively, some functional components are added together to construct an application. For example, a functional component Choropleth Map can contain several atomic components like polygon layer to colour regions or circle-glyph layer to represent values of some attribute. This circle-glyph layer may also be used in Scatter Plot functional component. Finally, an application includes a choropleth map and a scatter plot. Normally, inputs of functional components are nearly the same; for instance, the dataset needs to be represented. This commonality creates a leverage on setting-up a GAV Flash-based prototype.. 3.2.2. Data Model. The essential datatype of GAV Flash, DataCube, is the class that represents attribute-spacetime data which basically is a three-dimensional array. Moreover, there is a category of classes that convert raw data sources to DataCube calls Data Provider and a category of classes that apply some transformations to make the data more appropriate for visualisation calls Data Transformation..

(20) 3.2 GAV Flash. 3.2.3. 11. Performance. GAV Flash is implemented in Adobe’s ActionScript and this leads to two significant advantages. The first one is that the framework is inherently web-enabled. The second one is that according to recent survey of Millward Brown on March 2011, [11], Adobe Flash Player is installed on 99% of Internet-enabled desktops. It indicates that immense majority of users can use GAV Flash framework with rich internet application supports from Adobe Flash and Flex. However, the framework has to confront with one performance issue because ActionScript does not support multi-threading. The application may be not responsive for a few seconds when it has to do intensive-computing work or render a high-resolution map. More seriously, the application may be crashed when the processing time exceeds 15 seconds. To solve this problem, GAV Flash simulates multi-threading environment by creating psuedo-threads [1]. A big task is divided into smaller tasks and assigned to different frames as long as the work load can be finished within one frame. After executing the given task, the thread returns control to Flash Player so that users can interact with the application in a seamless manner.. 3.2.4. Interactive Features. GAV Flash supports many useful features that help users to interact with the application easily and interestingly, including selection: allow focusing on an item; for example, highlight the selected region on the map, hovering: allow quickly getting a glance at the hovered item; for example, a tooltip is displayed to briefly provide essential information when hovering a region on the map, zooming: allow zooming-in or zooming-out to a region of interest in the map, panning: allow dragging the map to a region of interest, filtering: allow filtering out irrelevant items based on some condition; for example, only display European countries on the world map, and linking: allow coordinating multiple-views on the display; for example, when Sweden is selected on the map, its corresponding item on Histogram and Scatter Plot are also selected.. 3.2.5. Sharing Knowledge. Snapshot While using a visualisation tool, users may need to save and share what they discover, especially when they are exploring a complex data set through a number of steps such as filtering data by one indicator, sorting them by another indicator and zoom onto an interesting view. Instead of simply capturing a screenshot, GAV Flash provides the snapshot capability to back-up all visualisation states of the application including common properties such as selected regions, colour map, filtering conditions, current time; and component-specific properties, for example, current map view in Choropleth Map; x, y and size attributes in Scatter Plot; selected indicators in Histogram. A snapshot is stored as an XML file and can be loaded again to restore the captured visualisation..

(21) 12. Background. Storytelling Based on snapshot mechanism mentioned above, users can save and share what they found by capturing a snapshot and storing it in file system and then others can load it into their application. However, GAV Flash’s snapshot is much more powerful than that; it can create an authoring visualisation story about comprehensive statistics data. A story has many chapters and each of them has a plot with visualisation capture to illustrate it. Besides normal texts, a visualisation chapter is reinforced by hyperlinks to external sources or to new visualisation captures.. Vislet Vislet, small visualisation, is a simple or a composite of many visualisation components and can be embedded into web pages. A vislet accepts a story and loads data from it. Therefore, the story can be conveyed through a vislet instead of the whole application. It facilitates the storytelling mechanism because a visualisation story can be put in any normal web pages such as personal blogs, wikipedia, statistics reports or even in PowerPoint presentation.. 3.3. Model-View-Controller Design Pattern. Model-View-Controller is a classic architectural pattern in software engineering that separate a program into three distinguished parts: Model, View and Controller and works as follows. First, the user interacts with the user interface and the controller notifies the model of this interaction, normally via some event handler. Second, the model updates and notifies the view of this change. Third, the view gets new data from the model and update its representation. Figure 3.1 illustrates the concept of Model-View-Controller pattern. The solid line indicates a direct call and the dashed line implies an indirect call; for instance, via an event notification.. Figure 3.1: Model-View-Controller concept. The solid line represents a direct association and the dashed line represents an indirect association; for example, via an event notification. With this separation of concerns, Model-View-Controller design pattern has following advantages First, the code is clear and the number of lines of code in one file is reduced. Second, the code management is safer and more reasonable. The view needs updating much more frequently than the model, especially in web-based applications. There-.

(22) 3.4 Flow Visualisation. 13. fore, the model class does not need to be touched in this case. Third, one model can have many views. It is very messy if all views are put together and conditional clauses are used to control this. Fourth, the program is specialized. Business handling and interface designing are quite different skills; thus by dividing them and assigning to specialists, the whole program will be improved and developers’ strengths are efficiently used. Fifth, it is easier to port the code to other display devices such as smart phones by just changing the views.. 3.4. Flow Visualisation. A flow visualisation should represent all three characteristics of a flow including the origin, the destination and its data volume. Very naturally, a flow is represented by an arrow pointing from the origin to the destination and the arrow’s width indicates the arrow’s data volume. Moreover, for aesthetic reason, a flow is symbolised as curve arrow, a symmetric quadratic Bézeir curve. Mathematically, a Bézeir curve is a parametric curve defined by its order (linear, quadratic, cubic, etc.) and a set of points controlling its appearance as follows.. Linear Bézier curves Given two control points P0 and P1 , a linear Bézier curve is simply a straight line between those two points B(t) = P0 + t(P1 − P0 ) = (1 − t)P0 + tP1 , t ∈ [0, 1]. Quadratic Bézier curves Given three control points P0 , P1 and P2 , a quadratic Bézier curve is defined by B(t) = (1 − t)[(1 − t)P0 + tP1 ] + t[(1 − t)P1 + tP2 ] , t ∈ [0, 1]. = (1 − t)2 P0 + 2(1 − t)tP1 + t2 P2 , t ∈ [0, 1] An example of a quadratic Bézier curve can be seen in Figure 3.2.. Figure 3.2: Construction of a quadratic Bézier curve. Source: http://en.wikipedia. org/wiki/File:Bezier_2_big.png Symmetric quadratic Bézier curve that is used in the application is simply a quadratic Bézier curve whose middle control point P1 lies on the perpendicular bisector of segment P0 P2 ..

(23) 14. 3.5. Background. Interpolation. Interpolation is a method of computing new data points within the range of a discrete set of known data points. The application uses the simplest method of interpolation, linear interpolation, for speed reason. Given two data points, say ( x a , y a ) and ( xb , yb ), linear interpolation for the new data point, at the point ( x, y) is given by: y = y a + (yb − y a ). (x − xa ) ( xb − x a ).

(24) Chapter 4. Approach In this chapter, I will discuss in depth the solutions to five objectives that I describe in the introductory chapter. Section 4.1 discusses about how to make the application webenabled with real-time interactive features especially for flows including selection, hovering and linking. Next, Section 4.2 provides two solutions to cluttering issue including flow appearance method and more importantly, flow analysis method. Section 4.3 introduces variety of analysis means such as back-and-forth flows analysis, time-series flow data visualisation, finding most interesting regions and reasoning flow patterns. Section 4.4 discusses how the application solves the sharing knowledge problem through snapshot mechanism with story-telling and vislet. Finally, Section 4.5 describes the problem of large dataset and proposes the divide-and-conquer method to solve this problem.. 4.1. Web-based Application with Real-time Interaction. Thanks to GAV Flash which is described in Section 3.2, the application is inherently webbased and supported basic interactive features including selection, hovering, zooming, panning, filtering and linking. Flows are also equipped some suitable interactive features in the same manner as in GAV Flash.. Selection When an arrow is selected, it is highlighted with customised colour. Arrow multipleselection is also supported by holding Ctrl button.. Hovering When an arrow is hovered, a tooltip is displayed to provide brief information of the hovered flow; for instance, Stockholm - Oslo, 2006, Commuting to, 466 persons.. Linking When arrows are selected in a certain view such as the map, all other views; for example, Data Table, will also highlight these selected flows.. 15.

(25) 16. Approach. Figure 4.1 shows an example of all these three interactive features. Three flows are selected in the map and they are linked with their corresponding rows in Flow Data Grid. The tooltip shows information about the flow from Stockholm to Oslo when this arrow is hovered.. Figure 4.1: An illustration of interactive features of flows: selection, hovering and linking.. 4.2. Cluttering Issue. Cluttering is the problem when there are many flows on the display and the flows overlap each other. In this thesis, two methods are applied to solve this issue. The first method is to improve the appearance of arrows to make them less likely overlapped and the second method is to control the number of necessarily displayed flows while maintaining important patterns.. 4.2.1. Flow Appearance Method. Using Curve Arrows In Section 3.4, symmetric quadratic Bézier curves are used for visualising flows. Besides aesthetic reason, curve arrow also helps partly reduce clutter. By using curve arrows, the overlapping problem when the origin and some destinations are nearly on the same line is solved. Figure 4.2 illustrates this problem and its solution using curve arrows. It cannot be denied that using curve arrows leads to another problem, crossing. This problem may happen when both flow directions are analysed simultaneously; however, as in Figure 4.3, crossing problem is less critical than overlapping problem..

(26) 4.2 Cluttering Issue. Straight arrows: overlapping issue. 17. Curve arrows: problem solved. Figure 4.2: An example of using curve arrows to solve cluttering issue. When the origin and destinations are on the same line, using straight arrows causes the flows overlapped (Left); however, the problem is solved using curve arrows (Right).. Figure 4.3: Crossing problem happens when analysing both flow directions at the same time. However, curve arrows are not difficult to be observed..

(27) 18. Approach. Adjusting Arrows Manually Another way to reduce clutter is to manually adjust the appearance of curve arrows. The application provides the capability to select an arrow and change its properties to prevent from overlapping with other flows including arrow flipping, arrow’s height and the distance from the region’s centre to the arrow’s end. Figure 4.4 shows an example of manually adjusting curve arrows to solve cluttering issue.. Original curve arrows. Manually adjusted curve arrows. Figure 4.4: An example of manually adjusting arrows to solve cluttering issue. The left figure is original curve arrows and the right figure is arrows adjusted using flipping, changing height and indentation.. Using Less-Consuming Space Arrow Header One more solution is to change the appearance of the arrow’s header to make it consume less space. Two improved header types are proposed are removing the anchor and removing the header as well as. Figure 4.5 shows an example of using “non-header” arrows to solve cluttering issue.. 4.2.2. Flow Analysis Method. Flow analysis method is the main solution to cluttering issue. There is no need to display all flows in the dataset, only the most important flows and the flows user wants to see are subjects to visualise. Therefore, the application provides some analysis methods to find interesting patterns including overview patterns and focus patterns. Moreover, any individual flows can also be displayed when needed..

(28) 4.2 Cluttering Issue. Normal head curve arrows. 19. Non-header curve arrows. Figure 4.5: An example of using “non-header” arrows to solve cluttering issue. The left figure is normal head curve arrows and the right figure is arrows without head.. Finding Overview Patterns When analysing the overview of the data, one of the most interesting questions is to find the biggest flows. The application assists answering this question by providing an interface to control the range of biggest flows needed to be found. Figure 4.6 shows this interface and the visualisation of biggest flows of commuting data between Norway and Sweden. An easy pattern can be found is that all top five biggest flows are from Sweden to Norway.. Finding Focus Pattern Normally, there are some particular regions that people pay more attention and want to analyse flows than the others; for example, the capital and high population cities. The application allows setting one or many regions as focus points and analyse only flows related to the focus points. More interestingly, the application enables to view top global flows and top focus flows simultaneously so that these flows can be compared together. This feature is achieved by rendering these two types of flows onto two different layers. Moreover, these two views are synchronised so that they use the same scale for arrow width and flows of two views can be compared together. Figure 4.7 combines top five biggest flows in Figure 4.6 with top five biggest flows commuting to Oslo. Besides three flows appeared in the overview pattern, there are other two flows from Stockholm and Skåne to Oslo..

(29) 20. Approach. Figure 4.6: Overview pattern can be revealed by finding the biggest flows. This is top five biggest flows of commuting data between Norway and Sweden. All five biggest flows are from Sweden to Norway, three of them are towards to Oslo and the rest two are towards to Norwegian counties that have border with Sweden.. Figure 4.7: Overview pattern and focus pattern in the same visualisation. Three biggest flows commuting to Oslo are in top global five biggest flows between Norway and Sweden..

(30) 4.3 Extensive Analysis. 4.3. 21. Extensive Analysis. The aim of the application is to provide users variety of means to analyse flow data and Flows Analysis Method that is mentioned in Section 4.2.2 is one of the good means. In this section, I will describe some more methods that help to analyse the data efficiently.. 4.3.1. Back-and-Forth Flows Analysis. The application supports analysing both two flow directions at the same time. All flows use the same scale so that they can be compared together. The back and forth flows have opposite arrow direction, different colours and are symmetric about the axis connecting two regions’ centres. Moreover, there can be more than one flow type for each flow direction. For instance, in trade data, there are two types of incoming flow: Import flow and Re-import flow. The application enables to display these two types concurrently by distinguishing them from their heights; all flows of the same type have the same height. Figure 4.8 illustrates the analysis of back-and-forth flows (left figure) and back-and-forth flows with two different flow types for each direction.. Back-and-forth flows. Four flow types. Figure 4.8: An example of visualising multiple flow types. The left figure is back-andforth flows. The right figure is back-and-forth flows with two different flow types for each direction. Flows are distinguished by colour and height.. 4.3.2. Time-series Flow Data Visualisation. There are two ways to visualise time-series data. The first method is to statically show data in all time-steps and the second method is to dynamically show data in each time-step and smoothly make a transition so that the data can be perceived as a motion over time.. Static Glyph When a region is focused, a map layer is used for showing the data in all time-steps. Each region contains a glyph, a small bar or line chart, showing the flows between this region and the focus point. All glyphs can share the same scale so that these glyphs can be compared together to extract overview pattern. Besides, glyphs can also use their own scale. It helps to observe useful pattern in time-varying data even though the values are small..

(31) 22. Approach. Figure 4.9 shows an example of a trade pattern between USA and Germany from 1988 to 2008 using this static temporal glyph.. Figure 4.9: An illustration of static temporal glyph using bar chart of trade between Germany and USA from 1988 to 2008. The trade gradually has increased over time and the trade from Germany to USA is slightly higher than the trade from USA to Germany.. Dynamic Animation Flows are continuously rendered using data in each time-step. There may be not enough time-steps to smoothen the animation; thus, interpolation technique described in 3.5 is used to compute in-between values of each two time-steps. By showing smoothly animation, the pattern of data change may be derived in a very natural way; for example, the flow broadens over time as in Figure 4.10. Figure 4.10: An illustration of dynamic animation of trade between Germany and France from 2002 to 2008. The seven consequent images provide a perception of data bigger and bigger over time.. 4.3.3. Finding Most Attracting Regions. Besides analysing flows, another concern is about the region that has many flows going through it. It indicates that the focus is the region having flows, not the flows themselves. To answer this question, flows should be aggregated by region. A map layer is used for showing the aggregate data. Each region contains a circle glyph whose area represents the aggregate volume of this region as in Figure 4.11..

(32) 4.4 Sharing Knowledge. 23. Figure 4.11: An illustration of finding most interesting regions based on commuting data between Norway and Sweden. Blue circles represent incoming direction and green circles represent outgoing direction. A very clear pattern is that people tend to move out of Sweden to Norway to work. Oslo is the most attracting county in Norway and Västra Götaland is the regions that people move most.. 4.3.4. Reasoning Flow Patterns. All aforementioned analysis methods help to find valuable patterns of flow data. Naturally, there is the need to find the reason behind found patterns; for example, why people in Sweden tend to go to work in Norway. Perhaps, there is some property in the destination “better” than in the origin. This property may be GDP per capita, unemployment rate or some regional attribute. It implies that the answer might appear if the pattern of flow data matches with the pattern of regional data. Therefore, the application supports analysing and visualising regional data together with flow data to figure out the reason of the movement. Regional data is visualised by colour of the polygon layer using a given colour map and/or by other GAV Flash functional components such as Histogram.. 4.4. Sharing Knowledge. As described in background chapter, Section 3.2.5, GAV Flash facilitates the sharing knowledge problem through snapshot mechanism, storytelling and vislet. Flow Map Application also inherits these features by implementing defined interfaces.. 4.4.1. Snapshot. The application supports snapshot feature and the following things are subjects to save and load All drawing flows including individual flows, top flows of a focus point, top global flows and other special flows in some case-studies if any..

(33) 24. Approach. All drawing glyphs including trend glyphs and top hub glyphs. All arrow and glyph settings. Focus points, selected destinations, selected flows, selected indicators and selected range of time. Map-related properties such as choice of background maps, background map visibility, heat map opacity and view of interest.. 4.4.2. Vislet. Theoretically, vislet in Flow Map Application follows vislet of GAV Flash. However, there is some adaptation due to the characteristic of flow data, two-relationship data. Details of all changes will be described in implementation chapter, Section 5.2.2.. 4.5 4.5.1. Large Dataset Problem. The OECD Trade case-study, visualising trade flows of OECD countries, is used to illustrate the large dataset problem. There are 4 flow types (Import, Export, Re-import and Re-export) of more than 4000 commodities in 21 time-steps among more than 200 countries. Therefore, the data for only one focus point is more than 1GB. It implies that loading the whole data for visualising is impractical.. 4.5.2. Divide-and-Conquer Solution. My solution is to divide that huge data into smaller parts and load only some manageable parts when the application starts-up. Then, users are allowed to modify the active dataset to add or remove any parts they want. It cannot be denied that the data should be divided so that related data must go together and there is no misunderstood pattern. The data is cut into many data files, each is for a combination of a focus point and a commodity. For example, the data file contains all four flows of a particular commodity coming in or going out Austria from all countries in all time-steps. The reason of this division is that normally, users focus on just some focus points of a particular commodity but they need to analyse the data of all flow types (to compare import and export), in all time-steps (to find pattern over time) and from all countries (to find the biggest flows). The setting of default active data is fully configured via a configuration file by users..

(34) Chapter 5. Implementation In this chapter, I will describe the main and challenging points of the application implementation. Section 5.1 introduces the architecture of the application. The section provides the implementation idea of three components: Flow Map Model, Flow Map Views and Flow Map Controller of Model-View-Controller design pattern that the application uses. Section 5.2 discusses in depth the implementation of snapshot and vislet for sharing knowledge objective of the application.. 5.1. Application Architecture. When the application starts, the program loader parses the configuration file to fetch the uniform resource identifiers (URIs) of map files, data files, story files and presets to build the desired start-up in case no story is available such as default focus point, default active attribute and so on. After reading the configuration file, the application loads map files, data, and presets into memory and passes them to the panel class which is the heart of the application. Figure 5.1 illustrates this step.. Figure 5.1: An overview of the application loader. The program loads maps, data, presets and passes them to the panel. The application is based on Model-View-Controller design pattern described in Section 3.3. The model generates data needed for the view to draw flows: a list of triples (value,. 25.

(35) 26. Implementation. origin position and destination position). The views are responsible for visualising flows or other glyphs. The controller receives user interaction, and interacts with the model to retrieve relevant flows corresponding to this change. Figure 5.2 gives an overview of Flow Map Model, Flow Map Views, Flow Map Controllers and their interactions. Details of these three components are discussed in sections below.. Figure 5.2: Three components of the Flow Map Panel and their communication.. 5.1.1. Flow Map Model. Data Model The model is the component that receives data passed to the panel including flow data and regional data. However, this global flow dataset is not sufficient to work with for two reasons. First, very frequently, only a small subset of data based on focus points are used. Second, the current structure of the global flow dataset does not conform to existing GAV Flash built-in components. The later reason is described in depth as follows. As discussed in 3.2.2, the key datatype of GAV Flash is DataCube, a wrapper class of a three-dimensional array, to represent the nature of space-attribute-time data as in Figure 5.3. DataCube can be considered as a spatial list of two-dimensional array of attribute-time data. However, flow data is more complicated. The “spatial” list of flow data is actually a directed combination of two spaces. Therefore, it does not match with a normal DataCube object and cannot be used for any standard GAV Flash components such as Choropleth Map, Colour Legend, and Histogram. Fortunately, global flow dataset is only used for visualising individual flows or top global flows and all other visualisations which require a standard dataset need to focus on a small sub-dataset. The solution is to convert this partial dataset into a GAV Flash IDataCube. The global flow dataset is organised as in Figure 5.4. The spatial dimension contains a list of spatial interactions and grouped by the origin. The attribute dimension includes attributes replicated for each flow. The temporal dimension normally stores all time-steps..

(36) 5.1 Application Architecture. 27. Figure 5.3: Datatype DataCube represents a three-dimensional array of spaceattribute-time data.. Figure 5.4: Data structure of the global flow DataCube..

(37) 28. Implementation. When a region is focused, all data related to this focus point is extracted from the global dataset to form a focus dataset as in Figure 5.5. The extraction is fast because the data is already sorted by the origins and is efficient because only reference of array of data is copied.. Figure 5.5: Data structure of the focus flow DataCube: same as the global DataCube but contain only records related to the focus point. However, it is more complex when multiple regions are focused. First, single DataCube is built for each focus point. Second, these DataCubes are merged into an IDataCube. Class MergeDataCube implementing interface IDataCube is designed for this combination. Thanks to the same structure of each DataCube, these instances can be merged by spatial dimension and temporal dimension whereas the attributes are replicated. This merging is memory efficient because there is no data duplication. Figure 5.6 illustrates this merging step.. Figure 5.6: Data structure of the merge flow DataCube: DataCubes are merged so that they share the spatial and temporal dimensions.. 5.1.2. Flow Map Views. There are many views in the application and they are independent; each view visualises a particular part of data of interest. The views get necessary data from the model to visualise. When the model is changed, an event is sent to some relevant views and the views fetch new data from the model to update their rendering. All four views that are used in the application are glyph layers and implement IMapLayer. Focus and Global Flows views are bi-positional glyph layers while Top Hubs View and Static Temporal View are mono-positional glyph layers..

(38) 5.1 Application Architecture. 29. Bi-positional Glyph Layer Listing 5.1 illustrates a brief of how to use Focus Flow View. There are two types of information that this view needs. The first one is the value of the flow (normalizedValues). The second one is the data that helps to identify flow locations (biPositionsProvider) and its appearance (flowTypes, flowGroups). Listing 5.1: An example of Focus Flow View. < genericMap : GenericMap > < layers : BiPositionGlyphLayer biPositionsProvider ="" < classes : DynamicCurveArrowGlyphFactory normalizedValues ="" flowTypes ="" flowGroups =""/ > </ layers : BiPositionGlyphLayer > </ genericMap >. Mono-positional Glyph Layer Listing 5.2 illustrates a brief of a how to use Static Temporal View. There are also two types of information that this view needs. The first one is the value of the flow (dataCube, startSlice, endSlice). The second one is the data that helps to identify flow locations (glyphInfoProvider) and its appearance (visibleAttributes, attributeColors). Listing 5.2: An example of Static Temporal View < genericMap : GenericMap > < layers : MonoPositionGlyphLayer glyphInfoProvider ="" < glyphs : BarLineChartGlyphFactory dataCube ="" startSlice ="" endSlice ="" visibleAttributes ="" attributeColors ="" </ layers : MonoPositionGlyphLayer > </ genericMap >. 5.1.3. Flow Map Controller. There are many kinds of interaction between users and the application through controllers. The controllers need handling all these interactions and force the model recomputing and consequently the views are updated.. Focus and Destination When a region is focused, the controller sends this information to the model so that the model can update the focus dataset. The focus dataset is the dataset sharing between many components and this leads to update in all these components including Focus Flows View, Top Hubs View and Static Temporal View..

(39) 30. Implementation. When a destination is selected, the controller also sends this information to the model and the model updates active flows. Consequently, Focus Flows View is updated.. Analysis Tools Change Each of the analysis tools such as finding top focus flows, top global flows and top hubs is controlled by a separate component. When there is a change; for example, the slider of Focus Flows changes, the corresponding control recalculates to find top flows based on its model and the change. After having the results of top flows, the controller sends this information to the model. Then, the model uses it to build updated data for views.. Active Indicator and Flow Types Selection The combination of indicator and flow type helps to determine the attribute in the attribute dimension of the dataset. This change affects on the analysis controllers described above and makes them updated.. 5.2 5.2.1. Vislet Snapshot. GAV Flash Snapshot Implementation Every visualisation component which supports snapshot feature has to implement an interface to provide the snapshot writer and the snapshot reader to handle its writing and reading so that GAV Flash can take them into account when processing the snapshot loading.. Flow Map Snapshot Implementation Applying the same approach as other components in GAV Flash, the application implements Flow Map Panel Writer and Flow Map Panel Reader to write the panel onto XML file and read from XML file to the panel. Moreover, every control which needs to be backedup and restored implements an interface containing write and read methods. The control’s interface implementation should write/read its necessary properties and call child write/read methods recursively.. 5.2.2. vislet. Embedded HTML Code The HTML code representing for a vislet is a typical object tag which defines (Listing 5.3 is a reduced example of an embedded HTML code for a vislet) the location of Vislet SWF file the layout customisation properties the components to use the visualisation story to load Google Map license key for using Google Map in current host embed tag with the same properties as in object tag to make the SWF file load in many browsers including Internet Explorer, Firefox and Opera..

(40) 5.2 Vislet. 31. Listing 5.3: A reduced example of an embedded HTML code for a vislet < object ... > < param name = ' ' movie ' ' value = ' ' Vislet . swf ? components =( ScatterPlot , ParallelCoordinates ) ; story = histogram - world . xml ; GoogleMapsKey = ' ' > </ param > ... < embed src = ' ' Vislet . swf ' ' ... > </ embed > </ object >. Vislet Creation Process The Vislet SWF file reads data passed from HTML code to build components and then load story into them. The process of creating a vislet in GAV Flash is as follows. First, parse parameters passed from HTML code Second, load configuration file to get map files and provide data set for the vislet Third, read the story Fourth, create visualisation components and invoke snapshot reader of each component. Typically, a vislet visualisation component (V) is a wrapper of its standard visualisation component (S). S’s data set is the data set got from the second step in the process above. V needs to provide S’s snapshot reader so that S’s visualisation capture can load into V. Moreover, S can define its own special widgets or settings because vislet is presentation or demonstration-oriented, i.e., it does not need to be as comprehensive as standard visualisation components.. 5.2.3. Flow Map Vislet. In order to build vislet for the application, I use the same approach as other vislet components; however, it is more complicated and I need to customize the following things. In the loading configuration step of the Vislet Creation Process, other vislet components use a standard data provider module of GAV Flash to load map files (for Vislet Choropleth Map Panel) and data set. This module gets only one map and is specially designed for eXplorer, a special application based on GAV Flash; whereas, flow map vislet needs an optional secondary map for country boundaries. Therefore, I need to build my own data provider module which implements IDataProviderModule. The module accepts a configuration file and outputs two EsriShp objects for two maps and the data set. Data set of flow map vislet is dynamic because it is changed when the focus points are altered. Technically, this data set needs binding to inputs instead of simple assignment. An compact setting tool with the same functionalities as in the application header is built for flow map vislet so that users can explore flows and change indicators or flows..

(41)

(42) Chapter 6. Results In this chapter, I will present most of the important and interesting features of my thesis through problem - solution mechanism. A statistic problem is proposed (Section 6.1) and then solved (Section 6.1) by my application. The step by step instructions are very detail so that they can be referred as an instruction guide of the application.. 6.1. Statistic Problem. To help readers thoroughly understand the main functionalities of the application as well as how the application can assist to analyse real-world issues, a statistic problem about “Commuting between Norway and Sweden from 2006 to 2008” is proposed and then solved by the application itself. In order to gain insight into the data, I suggest answering these seven questions as follows. 1. Which Norwegian counties commute to Stockholm most? (*) 2. Compare (*) with Norwegian counties commuting to Västra Götaland most. 3. Compare flows in (*) with their inverse flows. 4. Compare flows in (*) with most noticeable commuting flows between two countries. 5. Why are the flows towards to Norway so dominant? 6. Which counties attract people most? 7. What is the trend of people commuting between Oslo and Stockholm?. 6.2. Problem Solving Using the Application. Seven questions will be answered sequentially in the following sub-sections.. 6.2.1. Which Norwegian counties commute to Stockholm most?. First, we need to set the focus point to Stockholm. On the top-left corner of the map, there are two buttons to control the behaviour of the selection. Click onto the left button to set the selection mode to Focus Selection Mode, and then select Stockholm in the map. The focus region is highlighted with yellow colour. If you are not familiar with the map or the. 33.

(43) 34. Results. interested region is too small, you can select it the list of regions sorted in alphabetic order inside the Focus Points tab of the Settings Panel in the left-hand side of the map (this panel is closed by default). Second, we set the indicator to Commuting and the flow to Incoming by selecting them from two combo-boxes in the top of the application. Finally, we open the Top Focus Flows analysis dialogue by clicking on the leftmost button in the top toolbar (or using menu Settings - Exploration - select Top Focus Flows tab) and set the range of our interested flows to 1 - 5 by using the dual slider. Notes that the On/Off button is to control the visibility of the top flows, so make sure it is on. The Value button on the right-hand side is to show or hide detailed information about these top flows. An interesting pattern is that Oslo is the county that has the number of people commute to Stockholm most and Arkershus is the second county in the list. Many appearance attributes of the flows can be customized in menu Settings - Map Arrow including arrow header type, colour, width scale, height scale and so on (tab All Arrows). An arrow can be selected and customized individually in tab Selected Arrow (Figure 6.1). We also have many means to show the details of these top flows including Top Focus Flows dialogue, Focus Flow Data tab and Histogram.. Figure 6.1: Find top 5 Norwegian counties commuting to Stockholm most. Focus point is set by using map or alphabetic order list. Indicator and Flows are controlled by combo-boxes in toolbar. Range of flows is set in Top Focus Flows dialogue. Arrows appearance can be customized, for instance, the arrows in this figure do not have headers..

(44) 6.2 Problem Solving Using the Application. 6.2.2. 35. Compare (*) with Norwegian counties commuting to Västra Götaland most. Now, we do not only focus on the capital of Sweden, Stockholm, but also focus on the second largest county in Sweden, Västra Götaland. We have two ways to set another region as a focus point. The first way is to use the map: hold Ctrl button and click onto Västra Götaland. Make sure that Focus Selection Mode is active as described in Section 6.2.1. The second way is to use the list of regions sorted alphabetically, click on to the On/Off button at the top of the list to enable multiselection and check onto Västra Götaland item in the list. Then, both Stockholm and Västra Götaland should be highlighted with yellow colour and there will be ten flows on the display, five top flows to Stockholm and five top flows to Västra Götaland. A conclusion can be drawn is that the number of people going to Västra Götaland to work is higher than the number of people going to Stockholm to work. It is expected because Västra Götaland is the second largest county and it is much nearer Norway than Stockholm. We provide the capability to change the colour of each focus point to make flows of different focus points distinguishable. In Figure 6.2, we change the colour of Västra Götaland using the second colour picker in Incoming Colour line of the Arrow Settings dialogue.. Figure 6.2: Top five flows commuting to Stockholm and Västra Götaland. Colour of each focus point can be customized in the Arrow Settings dialogue..

(45) 36. 6.2.3. Results. Compare flows in (*) with their inverse flows. In Section 6.2.1, we know which counties commute to Stockholm most. How’s about the inverse flows? Do Stockholm people tend to move to these counties too? Simply open Top Focus Flows analysis dialogue and click on Inverse button, inverse flows of top incoming flows will be displayed (Figure 6.3). Looking at the result, we can conclude that people tend to move out of Stockholm to work in Norway than the inverse case.. Figure 6.3: Top five flows commuting to Stockholm and their inverse flows.. 6.2.4. Compare flows in (*) with most noticeable commuting flows between two countries. In this analysis, we want to compare the top commuting flows to Stockholm with the top commuting flows between Norway and Sweden to see how big and important they are. Click onto the second icon on the Exploration toolbar to open the dialogue for exploring global flows. Make the flows visible by switching the On/Off button. Now we can see both top local flows to Stockholm and top global flows (Figure 6.4).. 6.2.5. Why are the flows towards to Norway so dominant?. In Figure 6.4, we see that the incoming flows to Norway are much bigger than incoming flows to Stockholm and we are trying to reason it using regional data. Flow Map Application allows users to use both flow data and regional data at the same time. First, open menu Settings - Miscellaneous, the third option is to configure to include the regional data or not and the fourth one is to control its visibility in the tooltip. Makes sure these options are checked..

(46) 6.2 Problem Solving Using the Application. 37. Figure 6.4: Top five local flows commuting to Stockholm and top 5 global commuting flows.. Then, in the Colour Legend control, select the colour indicator to some regional indicator, for example, Unemployment Rate. We see that the colours of Norwegian counties are lighter than those of Swedish counties, i.e., the unemployment rate of Sweden is higher than that of Norway. This could be a reason why people tend to move to Norway (Figure 6.5).. 6.2.6. Which counties attract people most?. In order answer this question, we use Flows Exploration analysis feature. Click onto the third button on the exploration toolbar at the top of the application. Make the glyphs visible by switching the On/Off button. Top attractive counties are represented by circle glyphs whose sizes symbolize the commuters (Figure 6.6).. 6.2.7. What is the trend of people commuting between Oslo and Stockholm?. Finally, we want to analyse the commuting flows between two capitals of Norway and Sweden. First, set the focus point to Oslo. Second, click on the right button on the top-left corner control (Selection Mode) of the map to set selection mode to Destination Selection mode. Click onto Stockholm, it will draw flows from Oslo to Stockholm and vice-versa. Oslo is highlighted with yellow colour (Focus) and Stockholm is highlighted with purple colour (Destination). Now, we can play animation to see how these flows change over time..

(47) 38. Results. Figure 6.5: Using regional data to reason why people tend to move to Norway to work.. Figure 6.6: Top five attractive counties among both countries..

(48) 6.2 Problem Solving Using the Application. 39. Another way to view the trend is to use trend glyph, a small bar/line chart displayed over the polygon. Open menu Settings - Map - Glyph and check the Glyph visible check box to turn it on. There are several options in the Settings dialogue which allows us to control the glyph appearance and the flow directions to display. Check both Incoming and Outgoing to show both directions. In Figure 6.7, we find out an interesting trend during 2006 - 2008, the number of people commuting to Oslo increases whereas the number of people commuting to Stockholm decreases.. Figure 6.7: Using glyphs to show trends.. 6.2.8. Story-telling and Vislet. The analytical reasoning process to answer all those seven questions can be recorded into a visualisation story. The story is transformed to a vislet and embedded into a web-page. I created this visualisation story and placed it into my student homepage. The whole application is http://www.student.itn.liu.se/~haing604/thesis/SwedenNorwayCommuting/ and my homepage which contains the vislet is http://www.student.itn.liu.se/~haing604/..

(49)

(50) Chapter 7. Conclusion and Future Work 7.1. Conclusion. The application is web-enabled with real-time interactive features including selection, hovering, zooming, panning, filtering and linking. Cluttering issue is successfully solved using flow appearance method and flow analysis method. Flows are aesthetically visualised by symmetric quadratic Bézeir curves with fully customised appearance settings. Both context patterns and focus patterns are revealed while clear flow visualisations are maintained. The application is equipped with a variety of analysis means to help users analyse flow data efficiently including back-and-forth flows analysis, time-series flow data visualisation, finding most interesting regions and reasoning found patterns. Colleagues can save and share their findings while analysing flow data with the application using story-telling mechanism. Authoring visualisation story is created and edited with chapter structure and support of snapshot hyperlink. Moreover, flow map vislet can be created and embedded into ordinary web-pages to facilitate the spread of knowledge. The application also supports large dataset using divide-and-conquer method. A huge dataset is divided into semantic smaller parts and separate parts can be loaded on-demand.. 7.2. Future Work. My thesis is applied research; thus, I suggest to improve the following things to make the application more user-friendly and attractive. Build an application wizard to assist the creation of the application. Currently, the user needs to edit configuration files and this manual work may cause the application unnecessarily difficult to use. Support various data formats and provide data conversion wizard to make the application robust. Currently, due to the complex of the global flow dataset structure, there is an extra data pre-processing work to conform the data. Flows representation may be extended from two-dimension to three-dimension and rendered on Google Earth or Google Map to be more attractive.. 41.

(51)

(52) Bibliography ˙ [1] Q. Ho, P. Lundblad, T. Aström, and M. Jern, “A web-enabled visualization toolkit for geovisual analytics visualization and data analysis,” in Proceedings of SPIE, 2011, pp. 78 680R–1–78 680R–12. [2] D. A. Keim, F. Mansmann, J. Schneidewind, and H. Ziegler, “Challenges in visual data analysis,” in Proceedings of the conference on Information Visualization. Washington, DC, USA: IEEE Computer Society, 2006, pp. 9–16. [3] M. Jern, “Visual statistics storytelling - explore, collaborate and publish official statistics insight and knowledge,” 2010. [4] W. Tobler, “Flow map tutorial,” http://www.csiss.org/clearinghouse/FlowMapper/ FlowTutorial.pdf (accessed 2011/06/26). [5] D. Phan, L. Xiao, R. Yeh, P. Hanrahan, and T. Winograd, “Flow map layout,” in Proceedings of the Proceedings of the 2005 IEEE Symposium on Information Visualization. Washington, DC, USA: IEEE Computer Society, 2005, pp. 29–. [6] W. Cui, H. Zhou, H. Qu, P. C. Wong, and X. Li, “Geometry-based edge clustering for graph visualization,” IEEE Transactions on Visualization and Computer Graphics, vol. 14, pp. 1277–1284, November 2008. [7] D. Guo, “Flow mapping and multivariate visualization of large spatial interaction data,” IEEE Transactions on Visualization and Computer Graphics, vol. 15, pp. 1041– 1048, November 2009. [8] A. Rae, “From spatial interaction data to spatial interaction information? geovisualisation and spatial structures of migration from the 2001 uk census,” Computers, Environment and Urban Systems, vol. 33, pp. 161–178, 2009. [9] A. Dix and G. Ellis, “by chance enhancing interaction with large data sets through statistical sampling,” in Proceedings of the Working Conference on Advanced Visual Interfaces, ser. AVI ’02. New York, NY, USA: ACM, 2002, pp. 167–176. [10] J. J. Thomas and K. A. Cook, Illuminating the Path: The Research and Development Agenda for Visual Analytics. National Visualization and Analytics Ctr, 2005. [11] M. Brown, “Content played back in flash player reaches 99% of internet viewers,” http://www.adobe.com/products/player_census/flashplayer/(accessed 2011/08/09).. 43.

(53)

(54)

No results found