• No results found

Visualization of maintenance data to facilitate analysis and promote lifetime management of gas turbines

N/A
N/A
Protected

Academic year: 2021

Share "Visualization of maintenance data to facilitate analysis and promote lifetime management of gas turbines"

Copied!
48
0
0

Loading.... (view fulltext now)

Full text

(1)

LIU-ITN-TEK-A--15/014--SE

Visualization of maintenance

data to facilitate analysis and

promote lifetime management of

gas turbines

Jonas Petersson

(2)

LIU-ITN-TEK-A--15/014--SE

Visualization of maintenance

data to facilitate analysis and

promote lifetime management of

gas turbines

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Jonas Petersson

Handledare Katerina Vrotsou

Examinator Aida Nordman

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Abstract

This report documents the work and result of a master thesis in Media Technology and Engineering at Link¨oping University conducted in collaboration with Siemens Industrial Turbomachinery AB. The aim of the project was to develop an interactive visualization application to be used for data exploration. The purpose of the application is to provide Siemens with valuable insights about how different configurations of their gas turbines affect the lifetime of the machines and their components. The result is a JavaScript based web application, ViSITelligence, that allows employees at Siemens to explore the data in order to discover patterns and relationship between different settings in the configuration of the turbines. ViSITelligence has been developed through an agile process with usability and perception in mind, and facilitates the answering of questions and the emergence of new ones.

(5)

Acknowledgements

I would like to thank Davood Naderi, Daniel Dagnelund and Pontus Slottner, all at Siemens, for their support and feedback throughout the project. I would also like to thank my supervisor Katerina Vrotsou and my examiner Aida Nordman for their guidance throughout the work. A special thanks goes to Martina Norlin for her encouragement and endless support.

(6)

Contents

1 Introduction 3

1.1 Siemens . . . 3

1.2 Background and problem description . . . 3

1.3 Objectives . . . 4 1.4 Limitations . . . 4 1.5 Thesis outline . . . 5 2 Theoretical Background 6 2.1 Perception . . . 6 2.1.1 Preattentive processing . . . 6 2.1.2 Gestalt Laws . . . 7 2.2 Usability . . . 8

2.2.1 User-interface design guidelines . . . 8

2.3 Information Visualization . . . 9

2.3.1 Visualization Stages . . . 9

2.3.2 Data Types . . . 10

2.3.3 The Visual Information Seeking Mantra . . . 11

2.3.4 Visualization techniques . . . 11

2.3.5 Interaction techniques . . . 17

2.4 Related work . . . 18

3 Implementation 21 3.1 The Development Process . . . 21

3.1.1 Agile . . . 21

3.1.2 Prototypes . . . 21

3.2 Data extraction . . . 22

3.3 Client side technologies . . . 24

3.4 Application design . . . 24 4 ViSITelligence - Results 27 4.1 Data . . . 27 4.2 Options panels . . . 28 4.3 Visualization techniques . . . 28 4.3.1 Scatter plot . . . 29 4.3.2 Histogram . . . 30 4.3.3 Parallel sets . . . 31 4.4 Coordinated representations . . . 32

(7)

5 Discussion 35 5.1 Visualization techniques . . . 35 5.2 Implementation . . . 36 5.2.1 Development process . . . 36 5.2.2 Application design . . . 36 5.3 Interactivity . . . 37 5.4 Siemens . . . 37 6 Conclusions 39 6.1 Future work . . . 39 Bibliography 43

(8)

1 Introduction

This report is the result of a Master Thesis carried out in the Master of Science in Media Technology and Engineering program at Link¨oping University. The thesis has been conducted in collaboration with Siemens Industrial Turbomachinery AB [1] and describes an interactive application for visualization and analysis of turbines’ maintenance data.

1.1

Siemens

Siemens [2] is a global powerhouse in electronics and electrical engineering, actively op-erating in more than 190 countries and offering a wide range of pioneering products for energy efficiency, industrial productivity, affordable healthcare and intelligent infrastruc-ture, with a quickly growing focus on sustainability.

Siemens Industrial Turbomachinery AB, SIT AB, based in Finsp¨ang is one of the main settlements of Siemens in Sweden. SIT AB produces gas and steam turbines as well as a full service program for both.

1.2

Background and problem description

As part of Siemens service program, maintenance facts are collected out of the performed inspections on the operating gas turbines at different locations. The inspections can be planned or unplanned considering the scope or the outage time. The maintenance facts and details of the performed activities are reported regularly at the end of each mainte-nance event. These reports are produced based on pre-defined templates but there is no common vocabulary to be used by the maintenance team. The lack of a common vocab-ulary on how to describe the findings of the inspection on the gas turbine components and the severity of them results in multiple descriptions for the same type of damage. Moreover, some inspectors might be more detailed than others in the documentation of the findings which affects the data quality.

As a part of continuous improvement within Siemens service organization and to provide a uniform description of the collected data, a taxonomy has been developed within the recent years. The aim is to implement this taxonomy to digitize the data collection. One of the objectives is to develop an interactive visualization tool that can be used to evaluate the adequacy of this taxonomy. This evaluation will be done by mapping the collected historical findings versus the developed taxonomy. These historical findings are collected and stored in a database which has been developed for this purpose since the last three years. Investigation of the data quality and finding the missing information are the other aspects of this evaluation.

(9)

The other important usage of the interactive visualization tool is to explore the col-lected data to find patterns and relationships among different attributes of the config-uration, location and the specific findings during inspections. In order to improve the lifetime of the components, the engineering department is interested to find out if site parameters (such as salt exposure, altitude and distance from sea), configuration param-eters (such as fuel type and turbine model) and other relevant factors are correlated to specific inspection findings portraying the lifetime of the turbines and their components. The lifetime is defined as the occurrence of the remarks on the components, measured by the number of planned and/or unplanned inspections and replacements in combination with the extent of the remarks.

1.3

Objectives

The main objective of the thesis is to create an interactive visualization application to represent the collected maintenance data. The purpose of the application is to allow employees at Siemens to explore the data to provide them with valuable insights regarding the lifetime of the turbines and their components. The application will be used to answer existing questions regarding the lifetime as well as opening up for new questions that may occur when exploring the data.

The thesis will examine appropriate visualization techniques to be used to represent the provided multivariate data. The representations chosen should provide an overview of the data as well as a more detailed view of it. Users should be able to interact with the application to see relationships and patterns and to spot outliers in the data.

The focus of the thesis is to show the relationship primarily between categorical vari-ables. However, the user should also, in a lesser extent, be able to discover possible relationship and correlation between quantitative variables.

1.4

Limitations

With a given time frame for the thesis of 20 full time weeks the application’s functionality has to be restricted. The following limitations will therefore act as constraints to the project.

• The different remarks found on the inspected components are classified into five failure modes that do not take into account the severity of the damages. The use of failure modes reduces the level of detail provided for the remarks.

• The data set is not fully complete and consists of information from only two out of five different types of gas turbines manufactured by SIT AB. Additionally, historical inspection reports are reviewed partially and the collected facts only cover the life supervised components.

• Only a subset of all available attributes within the database have been selected to be used by the application.

• The application is web based and developed for modern web browsers support-ing scalable vector graphics (SVG). Supported browsers include recent versions of

(10)

Chrome, Firefox, Safari and Opera. Internet Explorer 8 and earlier versions do not support SVG and are therefore not considered.

• Smart phones and tablets will not be considered during the development process and are out of the scope of this thesis.

• Because of time constrains, extensive usability tests are excluded. However, the supervisor and the closest stakeholders at Siemens will be questioned about the usability of the application.

1.5

Thesis outline

The thesis will be structured as follows. Chapter 2 contains the theory used as a foun-dation for the thesis, including usability, perception and information visualization. A description of the implementation and development process can be found in chapter 3 together with the design and layout of the application. The resulting application and its functionality is presented in chapter 4 with a discussion found in chapter 5. Chapter 6 consists of final thoughts and conclusions drawn by the thesis and future work to extend and improve the application.

(11)

2 Theoretical Background

During development of any application, several design choices need to be made. Human perception and usability are two of the most important aspects to be considered in order to create a user-friendly and interactive visualization application. Understanding basics of human perception and usability are useful when choosing visualization techniques to represent different types of data. It also helps in choosing the proper interaction technique to maximize the user experience. This chapter presents the theoretical background used as a foundation for the project and introduces the theory behind the choices made for the application.

2.1

Perception

According to Johnson [3] the human perception is influenced by at least three factors: the past, the present and the future. These three factors can bias our perception at any situation in multiple ways and can be explained as our experience, the current context and our goals. Experience from similar situations affect the way the current situation is perceived according to what is expected to happen when, for example, objects or events are encountered. Placing the same object in two different contexts might leverage the human perception into believing it is two different objects. For example, the length of a line may be perceived differently depending on the context where it is used. The goals of the task at hand also bias the perception. If the focus is on finding a particular object, it is often easier to perceive it. Common for all sources of perceptual influences is that they have an impact on user-interface design [3].

2.1.1

Preattentive processing

Preattentive processing is a step in the visual selection process of quickly detecting vi-sual features. Preattentive processing precedes the focused attention - is noticed before awareness of it - and occurs when a visual feature is detected in less than 200-250 mil-liseconds [4]. In figure 2.1, the preattentive visual features of hue, length and density are presented. An object differing from all other objects, in terms of features, is easily distinguished and pops out from the surrounding distracting objects. For most intense effect, the distracting objects should be identical or at least very similar [5].

Situations when patterns do not pop out may occur when targeting an object based on two features. This is referred to as a visual conjunctive search and particular objects are often hard to see because our primary visual cortex only can be tuned for one feature. For example, trying to identify either square shapes or blue objects in figure 2.2a can be hard. Also trying to target objects with the same feature set but differently oriented or

(12)

(a) Hue. The red dot gets fo-cus immediately.

(b) Length. The longest bar is recognized early.

(c) Density. The High den-sity area receives focus first. Figure 2.1: Preattentive processing is an automatic selection process performed by the human vision. Hue, length and density can be used to guide the user’s focus taking advantage of the preattentive process.

target objects with similar colors are difficult [5], see figure 2.2b.

(a) Conjunctive search. (b) Similar colors.

Figure 2.2: Visual conjunctive search is the difficulty of targeting objects by two features. In 2.2a, the blue square is not preattentively perceived because it is surrounded by blue and red circles and squares. Furthermore, an object with similar color to the surrounding, distracting objects, is not seen preattentively. In 2.2b, the reddish circle is easy to distinguish, but the bluish circle is hard because the distracting objects have similar colors.

During the interaction with a representation, such as selecting or hovering with the mouse over objects, preattentive processing should be considered. The selected objects need to be quickly recognized. They can be detected preattentively by changing, for example, their color, or other features [5].

2.1.2

Gestalt Laws

Gestalt laws are robust rules describing the way human vision perceives patterns and groups of elements. One of the most useful Gestalt law in design is the law of proximity. The law of proximity states that objects that are close to each other are perceived as a group. In figure 2.3a, three groups of circles are depicted. The principle is useful when designing control panels to divide different usages into groups separated by lines or extra spacing [3, 6].

The similarity principle also conveys grouping of objects, with the difference of using similarity instead of proximity. In figure 2.3b the objects are perceived as grouped in rows due to their similarity [3, 6].

(13)

(a) Proximity. (b) Similarity.

Figure 2.3: The Gestalt laws of proximity and similarity are about grouping objects. In (a) the dots are perceived in three groups because of their relative spacing to each other. In (b) the objects are grouped in rows because of their similarity. The circles are perceived as one row and the stars as another and so on.

A third principle is the law of continuity which declares that it is easier for the human mind to follow smooth and continuous elements rather than elements with abrupt changes in the direction. In other words, lines are seen as following the smoothest path. Slider controls are an example of this principle where the handle is depicting a value on a single range rather than a divider between two different ranges [3, 6].

2.2

Usability

According to Krug [7], the first and most important law of usability is: ”Don’t make me think” - Steve Krug.

When looking at a web page, the user should be able to understand and use it without any further thinking or try and error. It should be obvious and self-explanatory. Hence, when developing a web application, a challenge is to get rid of all questions a user might ask. What is this? Can I click on that? Based on the fact that users scan pages instead of reading them, it is important to create a clear visual hierarchy on the page. Additionally, the web page should be divided into clearly defined areas allowing users scanning the page to decide which areas to focus on [7].

2.2.1

User-interface design guidelines

There are several guidelines for designing user interfaces, all based on human psychology [3]. Prevent errors is one of the eight golden rules defined by Schneiderman and Plaisant [8]. A system should be designed in a way that users cannot make serious errors. However, as the inscription Sigmund Freud wrote on his portrait that there are no rules to protect against all errors.

”There is no medicine against death, and against error no rule has been found.” - Sigmund Freud.

(14)

One way to prevent some errors is to disable the areas that the user should not be able to interact with at a certain time. If errors do occur, one should make sure that the user receives feedback of the error and instructions for recovery. Furthermore, developers should strive for consistency in their design by using, for example, a consistent color scheme and text font throughout the application. Informative feedback should be offered to the user on interaction like as described about tooltips in section 2.3.5. Additionally, one needs to make sure users feel they are in control of the application they are using and minimize the short-term memory load by keeping displays simple and following the rule of thumb for information processing that humans can remember ”seven plus minus two chunks of information” [8].

2.3

Information Visualization

Visualization is derived from the word ”visualize” which means to form a mental model or mental image of something. Thus, visualization is the cognitive activity of the human brain when images or data are interpreted [9]. Recently, however, the meaning of the term ”visualization” has mostly been described as the graphical representation of data [6].

The goal of visualization is to facilitate the understanding of the data by utilizing the human visual system’s ability to find patterns and trends, as well as identify outliers. One challenge of visualization involves the creation of appropriate and well-designed visual representations which can be used to improve understanding, memorizing and decision making [10]. The use of visualization, or graphical representations of data, may aid the formation of hypotheses and the understanding of features of the data [6].

2.3.1

Visualization Stages

According to Ware [6] there are four basic stages in the process of data visualization, connected through a set of feedback loops as shown in figure 2.4. The four stages are:

• Data gathering - This stage consists of the collection and storage of data. The data is gathered from one or several sources and is part of the longest feedback loop. • Data transformation - The data is preprocessed and transformed to reduce the

amount of data. Filtering is usually part of these transformations to remove irrele-vant data and possibly reveal otherwise hidden aspects of it. Other transformations may include restructuring of data into suitable data structures to ease future ma-nipulation. This process of selecting data prior to the visual mapping is called data exploration.

• Visual mapping - Selected data is mapped to visual cues, such as position, length and area, through the use of algorithms. An example of such mapping could be to map a pair of data values into a position in a two-dimensional space. Users are often allowed to interact with these graphical representations for getting a better understanding of the data. Common user interactions are to select and highlight a subset of the data or filter out data not fulfilling given conditions. This kind of user interaction is often referred to as view manipulation.

(15)

• Perceptual and cognitive processing - User interpretation of the information, involves perceptual and cognitive processing, for gaining insight and solving the task at hand.

Figure 2.4: The process of data visualization involves four basic stages, which can be combined as a pipeline. The user can interact with these stages by choosing how the data is gathered, explore the data and manipulate the view through a set of feedback loops.

Visual representations often reveal problems with the gathered data and the gathering process. Appropriate visual representations often highlight errors and artifacts, and are therefore useful for examining the quality of the data [6].

2.3.2

Data Types

In general, data used in information visualization can be categorized into two categories - quantitative and categorical data [11].

Quantitative Data

Quantitative data, or numerical data, is data in form of numbers that measures things. Numerical data is useless unless it is used together with its related categorical value [12]. For example, the value 449,964 is useless unless its categorical value is provided, which is the area of Sweden in square kilometers.

(16)

Categorical Data

As opposed to quantitative data, categorical data is often non-numerical. According to Few [11, 12] the categorical data identifies what the quantitative data represents, and comes with three fundamental types when used in graphs: Nominal, Ordinal and Interval. • Nominal - Items in a nominal scale are discrete values without an intrinsic order and are only differing in their names (that is, nominally). The items in a nominal scale do not relate to one another in any particular way although they belong to a common category. Examples of nominal scales are fruits (e.g. apples, bananas, oranges) or regions (e.g. Sweden, USA, Russia). As stated by Yau [13] numbers can be used with nominal scales in some cases, like for the number on a bus representing the route on which it travels.

• Ordinal - Ordinal scales consist of items with an intrinsic order but as for nominal data the individual items do not represent quantitative data. Examples of ordinal scales involve rankings such as “First, Second, Third” or “Small, Medium, Large”. Listing items in an ordinal scale out of sequence does not make sense and would create confusion.

• Interval - Items in an interval scale has an intrinsic order like the items in an ordinal scale, but for interval scales they represent quantitative values. An interval scale is a quantitative scale that has been converted into a categorical scale by grouping the values into smaller ranges of equal size. Interval scales often represent units of time, such as year and month, although years and months are not always of equal size.

2.3.3

The Visual Information Seeking Mantra

A basic principle to follow when creating visual representations is the Visual Information Seeking Mantra: ”Overview first, zoom and filter, then details-on-demand”, by Schnei-derman [14]. Craft and Cairns [15] discuss the importance of this mantra. An overview provides a general display of the dataset, allowing users to get an understanding of the data and spot relationships and patterns. Zoom and filter allows for a simplified view, by selecting or deselecting subsets of the data to be shown or removed from the view. Zooming and filtering can reduce the complexity of the display assisting in further in-vestigation. Details-on-demand can be provided on mouse-over or selection of elements. As mentioned in section 2.3.5, tooltip is a suitable way to provide the details when the user is hovering the elements of a representation by the mouse. When hovering elements with the mouse cursor in a representation, additional or detailed information about the selection can be shown in a tooltip, as described in section 2.3.5.

2.3.4

Visualization techniques

Data can and should be represented in different ways, depending on the data itself and the message to present. Common for all visualization techniques is though that the data values are mapped to visual attributes such as position, size, shape and color. The human brain is better to decode some visual attributes than others. The visual attributes of position and length are more accurate encodings than area and color [9, 10, 16].

(17)

A set of techniques used for multivariate data visualization will be presented in this section. Scatter plots, scatter plot matrices and parallel coordinates are often used for quantitative data while the parallel sets and mosaic plot representations are mainly used for categorical data. Other techniques for representing multivariate categorical data exist, such as treemaps [10, 17], sunburst [10] and icicle plots [10]. However, they are mostly used for displaying hierarchies so they are not discussed in this thesis.

Bar chart

A bar chart, see figure 2.5a, uses length to encode quantitative data. Each rectangle represents an item in the dataset, where its height represents the measured value for the category. Bar charts are most often used to display discrete data for comparison of multiple categories [17].

Variations of bar charts exist. By grouping bars, see figure 2.5b, with the same categorical variable multiple measures can be displayed for each category. Bars can also be stacked on top of each other showing the relationship of each part to a whole. Stacked bar charts, see figure 2.5c, are considered when both the total and its parts are important for the message presented by the graph.

A third variation of the bar chart is called histogram. A histogram, see figure 2.5d, shows the distribution of a dataset, by grouping measures into bins or ranges of equal size and counting their occurrence. In a histogram, each bar represents a range and the height of each bar represents the number of occurrences of categories within that range. Histograms can facilitate in finding clusters and spot outliers in a dataset [16].

Location map

When the geographic location of a data point is of importance, a location map can be used. In a location map, the data points are placed on a map according to their corresponding latitude and longitude values. One additional dimension can be added by encoding the size of the dots as a variable [16].

Scatter plot

A scatter plot, see figure 2.6, is a visualization technique used to represent data in a two-dimensional space. Scatter plots use the most accurate visual attribute, namely position, to encode data values. Shapes representing the data points are positioned in a Cartesian coordinate system according to their values for each axis. For every data point, the value of each axis represents the quantitative scale. By using different color, size and shapes additional dimensions can be displayed in the representation. A scatter plot representing bubbles of different size is called a bubble chart [17]. When size is used to represent a dimension, the bubbles should be sized by their area and not by their radius, diameter or circumference [13].

Scatter plots can be used to determine if there is a correlation between two dimensions in the data, like if the value for one dimension increases then the value for the other dimension also increases or decreases in a corresponding manner. They can also be used to find outliers, data points differing from all other points, in the data [17].

(18)

(a) Bar chart. (b) Grouped bar chart.

(c) Stacked bar chart. (d) Histogram.

Figure 2.5: Different variations of bar charts exist. Common for all is that each object of a dataset is represented by a rectangle, where rectangle’s height depicts its measure for the variable on the y axis. In (a), a bar chart is showing three objects. In (b), rectangles are grouped in pairs, representing two features of the same object. In (c), bars are stacked on top of each other representing parts of a whole. The histogram in (d) is showing the distribution of a variable with the number of occurrences as the height of each bar. The extent of possible values for the variable has been divided into twenty equally sized intervals with one rectangle per interval.

Figure 2.6: The scatter plot is a multi-dimensional visualization technique displaying a circle for each data item. The positions of the circles are determined by the values on the x and y axes. Color are added to represent an additional dimension of the dataset.

Scatter plot matrix (SPLOM)

(19)

lation and relationships between data points, but not limited to only two dimensions (provided that no additional dimension are mapped to, for example, the size). In a SPLOM, the correlation between any pair of variables can be inspected. Patterns in the pairwise relationships are easily observed, but with higher dimensions some patterns may be unrecognized [10, 18].

Figure 2.7: The SPLOM consists of multiple scatter plots organized as a matrix. Each scatter plot represents the relationship between a distinct pair of variables. Each relationship is shown twice, but mirrored (above and below the diagonal (top left to bottom right)).

One disadvantage of the SPLOM is when the number of dimensions increases the number of different pairwise relations increases rapidly. Also, the number of data points needed is multiplied by the number of unique pairwise relations. For example, 600 data points are required to visualize the pairwise relationship for 100 data points in four dimensions which means 100 data points for each of the six unique pairwise combinations of the dimensions [9].

The interaction techniques of brushing and linking, which will be described in section 2.3.5, can be applied to highlight interesting points in all views, and thus limit the number of data points preattentively focused in the different views [18].

An alternative to using SPLOMs for representing multivariate data is to use a single interactive scatter plot with the possibility to change what is represented by the axes and filter the data [9]. However, using this approach, the relationships of the different pairs

(20)

Parallel coordinates

Parallel coordinates, see figure 2.8, is a technique used to represent multidimensional data. The data are turned into sets of points with each point representing one dimension of the dataset. The points for each set are placed on uniformly spaced parallel axes (instead of orthogonal axes, like scatter plots), one for each dimension, and are connected by line segments creating a polyline for each set [19, 20]. The parallel axes are independent of each other, making it possible to display up to about 10-15 axes in the same view [19]. Using 10-15 axes would, however, violate the design guideline of minimizing the short-term memory load, described in section 2.2.1.

Figure 2.8: Parallel coordinates representation showing one polyline for each set of points con-nected to a particular data dimension. Colors are used to add an additional dimension to the plot dividing the polylines into two different classes.

Parallel coordinates should not be thought of as a normal line graph where the slope of the lines indicates change through time. Instead, the lines connect a series of data points that measure multiple aspects of an entity, such as a fruit or region [21].

Limitations of the technique includes problem with analysis of correlation between all but adjacent axes due to their parallel placement. Parallel coordinates representations also often suffer from cluttering problems appearing already for medium-sized data sets, resulting in an image that is hard to analyze for trends or structure [20, 22]. Instead, it has one of its strengths when used interactively for analysis using techniques like brushing [20], described in section 2.3.5. Other advantages of the parallel coordinates technique is its capability to present many related dimensions in a limited space. Additionally, in the absence of cluttering problem, relationships between results can easily be investigated and trends of data become visible [20].

Parallel sets

Parallel sets, see figure 2.9, is a technique and interaction framework for mapping multi-dimensional categorical variables to visual entities. Parallel sets are influenced by parallel

(21)

coordinates and thus share their layout, with the difference that the point intersections are replaced with sets of lines, originally boxes, representing the categories. The length of each line corresponds to the frequency of the category it represents. By displaying the frequencies in a discrete design model and having independent axes, the parallel sets implementation combines the advantages of both frequency-based techniques and parallel coordinates [19, 23].

Figure 2.9: Parallel sets representation with a ribbon for each unique combination of categories. The blue and orange colors represent each category of the upper most dimension (Dimension1). This increases the possibility to distinguish between the flows corresponding to the different categories of the top dimension.

Using parallel sets in combination with a selection feature allows users to deselect categories and dimensions to get a more detailed view of the interesting selection, which follows the principle of the Visual Information Seeking Mantra [24] described in section 2.3.3. A parallel sets implementation provides an overview of the data and its flow or distribution among the categories for multiple dimensions (see figure 2.9). By selecting or deselecting categories or dimensions the data is filtered and the representation is only showing the selected data. Details-on-demand are added by showing details of a selection when users hover a flow in the representation. The possibility to rearrange axes grants the permission of viewing the overview with different arrangements.

One drawback of the parallel sets implementation arises when there are several cate-gories for a dimension or when they differ a lot in size. In those cases the many intersec-tions can make it hard to interpret and compare the different categories [19].

(22)

Mosaic plot

Like parallel sets, mosaic plots use frequency-based techniques for representing multi-variate categorical data. Mosaic plot, see figure 2.10, is a recursive space-subdivision technique [25], which means that frequency measures of a category are divided into subcategories. For example, a dataset containing information about an accident, with the categories survived and gender can be divided into the subcategories yes/no and male/female respectively. Each frequency value is mapped to the area of a rectangle, as opposed to parallel sets where each value is mapped to the length of a line.

Multiple dimensions can be added to the representation increasing the number of rectangles displayed. With increased number of dimensions, spacing is needed between the rectangles to group the connected combinations of categories within dimensions. Dis-playing more variables makes the plot more difficult to interpret because the human brain has difficulties to distinguish differences in area, especially when the areas are not aligned according to a certain baseline and differ in aspect ratio [26].

Figure 2.10: Mosaic plot with each rectangle representing the frequency of pair of variables from two dimensions in a dataset. The categories of one of the dimensions are divided into columns, while the categories of the other dimension are grouped by color.

2.3.5

Interaction techniques

Filtering

Filtering allows for selection of a subset in the data to be focused in the view. Filtering removes uninteresting data points from the view not fulfilling the criteria of the filter. This reduces the cognitive effort required from the user to focus on a subset of the data when all data points are visible [9].

(23)

Brushing and linking

Brushing is a technique for selecting data interactively with the mouse, for example by filtering it. Brushing is commonly combined with linking, which allows the selection to be displayed in other views of the same dataset [20].

Tooltip

A tooltip, see figure 2.11, is an information box showing additional information about a selection when the user selects or mouse-over an element in a representation. Tooltips can be used to show details of the selection, such as the exact values it represents.

Figure 2.11: A tooltip shown when hovering with the mouse over an element in a scatter plot. The tooltip provides detailed information about the element and its value on each axis.

Zooming and panning

Zooming gives different levels of detail to a visualization technique. By zooming, a subset of the data set can be displayed more accurately revealing information that otherwise may be hidden when displayed as an overview [20]. Panning is often used in combination with zooming to allow users to change the area of the representation currently zoomed.

2.4

Related work

Applications for representing multidimensional data have been created before. The appli-cations are chosen because they are used to help users discover valuable information and patterns in multidimensional data using visualization techniques. The visualization tech-niques used in these applications are either similar to the ones chosen for the application presented in the thesis or applicable to the datasets used. Several studies involving the representation of multidimensional categorical data has been conducted [24, 27]. These studies describe applications representing categorical data using parallel sets. The ap-plications include several features that can be useful for representing multidimensional data. In one of the studies [27] aggregated cancer registry data is presented in a parallel sets representation implemented for the web. The representation uses curved flows to increase traceability and also allow users to highlight a flow of interest. However, the

(24)

application lacks the possibility of filtering among dimensions and categories to remove uninteresting parts of the data for the current task. The other study [24] uses an ap-plication developed in Java for presenting the result of a service awareness campaign, the steps of a record cleanup process and profiling data from a bank. This application provides filtering functionality to be able to explore sub parts of the dataset, but opposed to the web application in the former study, the ribbons are straight instead of curved. Additionally, a Java application needs to be installed on all clients that will use it. The applications mentioned in the studies above only use one type of representation (parallel sets) to present the data which limits the possibilities to display multiple aspects of the data set, which is the key requirement for this project.

Another application for multidimensional data visualization, also implemented in Java, is PRISMA [28]. PRISMA incorporates the use of coordinated views to repre-sent multiple aspects of a dataset. With coordinated views, interactions in one view will be extended to the other views to display or highlight parts of the data. This is a useful feature for data exploration and fulfills The Visual Information Seeking Mantra presented in section 2.3.3. The visualization techniques implemented in the application are: scatter plot, treemap and parallel coordinates. Treemap is a space-subdivision technique (like mosaic plots described in section 2.3.4) that represents hierarchical data. Using a treemap for categorical data without a hierarchy would need the use of multiple rectangles for the same category, but as sub areas of a larger area. The parallel coordinates representa-tion has its strength in showing relarepresenta-tionships and correlarepresenta-tion between quantitative data attributes. With the primary focus of this thesis to represent categorical data there are better alternatives.

Mondrian [29] is a Java application for data visualization. Mondrian offers mosaic plots and parallel coordinates for high dimensional data visualization together with stan-dard plots like histograms, bar charts, scatter plots and maps. Similar to PRISMA, it uses coordinated views to allow for advanced data analysis. The disadvantage of a Java application has already been discussed and the same for parallel coordinates. In addition to a traditional parallel coordinates representation Mondrian offers a version that uses boxplots as the axes to display how the values are distributed over each axis. A boxplot is not very effective in use with categorical data when there are only a few possible values. The mosaic plot is useful when displaying up to four or five dimensions. When more dimensions need to be displayed, it gets harder to interpret the area of each rectangle as well as keeping in mind what dimension and category it represents. Mosaic plots often use tooltips instead of labels around the plot because as the number of dimensions increases the space needed for the labels is also increased. Moreover, if users want to know the total value of a category as part of the whole they need to summarize the values for each rectangle representing the category. It has to be provided by the tooltip or by another visualization technique, for example a bar chart. Histograms, bar charts and scatter plots are useful for comparing values of different entities, spot outliers and correlation. With limited space, a large number of entities is hard to display in a bar chart. Mondrian uses scroll bars to allow users to scroll through all the rectangles. With different types of sorting some interpretation is possible, but it still misses the possibility of an overview and comparison of certain rectangles not closely positioned. An alternative provided by the application is histograms, to show a distribution as described in section 2.3.4 about bar charts. The maps provided are choropleth maps which use color to map a variable to a region on the map. For the purpose of the thesis, choropleth maps are not relevant because the location (latitude and longitude) of a gas turbine is not of major importance.

(25)

Even less its region. More important are the parameters of the location, like distance from sea and salt exposure.

Multidimensional datasets for car specifications have been visualized in applications using parallel coordinates or star coordinates [30, 31]. Star coordinates have been found useful for gaining insight into clustered datasets. Due to the radial placement of the axes the exact data values for points in the representation are hard to interpret. The axes of a star coordinates representation are arranged on a circle with the origin at the center of the circle. Initially they are separated with equal angles but through interaction both the length and orientation of the axes can be changed to not be equally spaced. Due to the circular arrangement of the star coordinates representation, the space needed is independent of the number of dimensions. For parallel coordinates, the size increases linearly by the number of dimensions, if distance between axes is kept the same. Star coordinates has also been used for a dataset containing ratings of American cities for a number of criteria [31].

(26)

3 Implementation

This chapter describes the development process and the technologies used to extract the data and to create the visual representations of the web application. As mentioned in section 1.3, the aim of the application is to allow users to explore the data to gain valuable insights regarding the lifetime of the turbines and their components. The design of the application is also described together with how the golden rules for design presented in section 2.2.1 are considered.

3.1

The Development Process

3.1.1

Agile

The application has been developed using an agile development methodology [32]. Unlike traditional development methodologies, such as the waterfall model, where the process takes place in stages with the previous stage finalized before continuing to the next, agile methodologies are more dynamic. Dynamic methodologies are less vulnerable to changes late in the process, because the development takes place iteratively [32].

A product backlog was created from a set of general user stories based on the needs of the application. The user stories were broken down into tasks to be implemented. The tasks were rated according to their importance to the application and each week, the highest prioritized tasks from the product backlog were selected. The selected tasks were broken down into smaller tasks, which were time estimated and conducted during a sprint. On weekly meetings, at the end of each sprint (iteration) a working prototype was created, incrementally extended and improved during later sprints. During the meet-ings the prototype was reviewed and possible changes and improvements were discussed. The changes and improvements appearing during the process were added to the product backlog to be selected in a later sprint.

3.1.2

Prototypes

Lo-Fi prototypes were drawn in an early stage of the development process. In this project, the prototypes of the layout (figure 3.1) are focusing on the concept of the layout and the functionality of the option panels. The Lo-Fi prototypes were used in combination with interactive prototypes that were shown to the closest stakeholders at Siemens in order to present the different possibilities. The use of prototypes can facilitate in planning the structure of the application and the selection of visualization techniques to be used. Figure 3.2, shows the final Lo-Fi prototype of the visualization application which was used as a template when implementing the application.

(27)

(a) Application layout.

(b) Application layout with option panels visible.

Figure 3.1: Lo-Fi prototypes of the application layout designed in the early stages of the de-velopment process. The prototypes represent the layout of the application with (a) the options panels hidden and (b) shown.

3.2

Data extraction

The first and second stage in the process of data visualization, section 2.3.1, is about the collection and transformation of data. The data used by the application is extracted from a Microsoft SQL Server database using the web server, Internet Information Services (IIS) [33] and the ASP.NET Web Pages [34]. The data extraction is written in ASP.NET with Razor Syntax [35] which includes SQL query functionality. Razor syntax is a server

(28)

Figure 3.2: Lo-Fi prototype of the application constructed and refined during the development process.

side programming syntax for embedding server-based code into web pages. Figure 3.3 illustrates the process of the data extraction, starting with the client’s request. The process continues on the server that handles the request and sends a response back to the client.

Figure 3.3: The process of extracting data from the server. The client requests data when entering a page. The web server handles the request and lets ASP.NET process the page. A code snippet from the markup code of ASP.NET with Razor Syntax is called to execute a SQL query to select the data from the database. The database processes the query, and returns the data which is formatted as JSON by the ASP.NET with Razor Syntax code before sending the response to the client.

(29)

3.3

Client side technologies

The application uses several technologies to create and structure the content and visual representations. The structure of the web application is created in HTML5, which is the most recent web standard of the markup language HTML (HyperText Markup Language) [36].

The style and design of the web page is written in CSS (Cascading Style Sheets) [37] that can access elements created by a markup language, such as HTML5, and change the design (for example color, placement and size) of the elements.

To create the visual representations the JavaScript library D3.js (Data-Driven Docu-ments) [38] was used. D3 allows for manipulation of documents through the Document Object Model (DOM) [39] based on data.

The outline of D3.js is to create SVG elements that when combined specify the graphical representations. SVG, Scalable Vector Graphics, is a format for creating two-dimensional images with support for animation and interaction. D3 provides functionality (such as mouse events for interaction) and components (such as axes and scales) ready to be used in the application as well as the possibility of customizing the visualizations. The functionality and components are combined to create different graphical representations. Furthermore, jQuery [40] and jQuery UI [41] were used to provide additional func-tionality and components to the web page. jQuery is a fast and feature-rich JavaScript library that provides functionality for animation and event handling. jQuery UI is a set of user interface interactions and widgets built on top of jQuery. One of many widgets offered by jQuery is a range slider allowing users to filter data within an interval selection.

3.4

Application design

The general design and layout of the application has been constructed to support and follow guidelines for usability (see section 2.2) and facilitate user perception and interac-tion, as discussed in section 2.1. The application is limited to the size and resolution of the monitor, meaning that the representations and their option panels should fit on one page with the scroll bars disabled.

The page is divided into three clearly defined areas allowing users to scan the page, deciding what area to focus on. The different areas of the application are separated by a light gray color, as seen in figure 3.4. Additionally, to create a clear visual hierarchy of the page, the option panels are hidden from start, showing only buttons to toggle their visibility. The buttons are positioned in corners of each area away from adjacent areas, making clear what area each option panel belongs to. Thus, with one exception for the lower right area where the button is placed in the top right corner to be consistent with the uppermost area to the right. With the option panels hidden, the visual noise is reduced guiding the users’ attention to the representations.

Furthermore, it should be obvious how to interact with each element. By changing mouse cursor the user is informed about what actions are possible for the element in focus. The different cursors used in the application, together with their associated interactions are displayed in figure 3.5.

Five of the golden rules defined by Schneiderman and Plaisant, section 2.2.1, have been taken into account for designing the application:

(30)

Figure 3.4: The page is divided into three clearly defined areas containing each representation. Buttons to toggle the visibility of the option panels are located in a corner of the area it is used with.

Figure 3.5: Mouse cursors used by the application to guide user interaction and their related actions.

• Prevent errors - To prevent from errors, an intelligent filtering functionality is available. By filtering out data points for one parameter, if no data points exist for other parameters they are disabled so users cannot select or deselect them. Furthermore, the parallel sets representation requires at least two dimensions with at least one category each. By disabling controls for the filtering scenario that violates this, errors occurring when no data is selected can be prevented. To prevent errors during the initialization process of the application, a loading screen with information about the application is visible until all data has been loaded and the representations are created.

• Strive for consistency - By using a common color scheme for the representations the design is more consistent. Moreover, a uniform description of the mouse cursors used throughout the application and the design of control panels and tooltips are also contributors of striving for consistency.

• Informative feedback - Informative feedback, can be connected to the third part of the Visual Information Seeking Mantra: Details on demand, section 2.3.3. When users interact with the application and are hovering objects they are provided with tooltips containing detailed information. Additionally, the representations are co-ordinated and updated when the data is filtered giving users immediate feedback of their action. Filtering data for one representation affects all other representations

(31)

accordingly.

• Make sure users feel they are in control - The second part of the Visual Information Seeking Mantra: Zoom and filter, section 2.3.3, is related to the feeling of control. Initially users are given an overview of the data. By providing option panels with filtering functionality and the ability of zooming in and out users are able to simplify the view, showing only data that is relevant for the task at hand. Furthermore, adding the opportunity to change parameters of the axes and their arrangement, users are given control over how the data is shown.

• Minimize the short-term memory load - In graphs providing a legend, the number of items are limited according to the rule of thumb for information processing, sec-tion 2.2.1. The legends are sized dynamically according to the number of items. However, to minimize the short-term memory load, users are limited into selecting dimensions with at most seven categories, making it at most seven different colors and their corresponding labels to keep in mind when investigating the representa-tions.

(32)

4 ViSITelligence - Results

The result of the thesis is an interactive visualization application that fulfills the objec-tives presented in section 1.3. Images shown in this chapter are not created with the maintenance data described in the next section due to confidentiality.

4.1

Data

The application uses two datasets where data objects of the datasets are connected by an identification number for the gas turbine they are associated with. A gas turbine is also referred to as a machine. A simplified view of the retrieved data attributes is found in figure 4.1.

Figure 4.1: Five types of data attributes has been retrieved. Climate, site and configuration data for general information about a machine and its location. Inspection data for the general inspection information and inspected components data for maintenance findings on components.

One dataset contains more than two hundred machines with specific information about their location and configuration. The dataset consists of information about the altitude and distance from sea of the machines’ location. It also contains the configuration settings of the machine, such as the turbine model and fuel type used. For each machine, data col-lected during inspections are also provided, which includes the time the machine has been in operation, among other. Additionally, the number of inspections, both planned and unplanned, and the number of removed components for each machine are provided. The dataset contains primarily quantitative variables. However, some categorical variables are provided to be able to group machines by, for example, coloring the visual entities connected to the machine according to its turbine model or fuel type.

(33)

As opposed to the first dataset, the second dataset includes mostly categorical vari-ables. The second dataset contains information from the inspections of the machines. Both inspection and component specific information is provided together with configu-ration settings for the machines and if the machine is exposed to high levels of salinity. The inspection specific information is, for example, whether the inspection was planned or unplanned. Component specific data is about the findings when inspecting the com-ponents. This includes the remarks found on the components and a judgment from the inspector about the findings on the components and if it still can be in operation or need to be replaced. Configuration settings are the same for this dataset as for the first. A total of ten categorical variables are selected and grouped into unique combinations of the values for the variables, together with an identifier for the machine they are connected to. For each unique combination, the accumulated number of inspected components is presented. Each of the ten variables, or dimensions, has between two and eight different values.

4.2

Options panels

The Gestalt laws of proximity, similarity and continuity, described in section 2.1.2, are considered when designing and positioning the controls in the options panels, as shown in figure 4.3, 4.4 and 4.5. Controls affecting the same parameter of the representation, or with similar usage, are grouped together by separating them from other controls by adding extra space. The law of continuity is utilized by the use of slider controls to depict a value or interval within the range of the control. The content of each options panel is described thoroughly in the next sections, dedicated to different visualization techniques.

4.3

Visualization techniques

Figure 4.2 shows an example of the application’s interface. This interface can be used to gain a deeper understanding of the raw data described in section 4.1. Therefore, multi-dimensional visualization techniques are used to provide users with as much information as possible. By using three different types of representations, multiple aspects of the data can be displayed, both as an overview and in more detail. By connecting the three representations, all views will display data for the same considered scenario.

To avoid problems with conjunctive search and similar colors (see section 2.1.1), only one shape is used in the scatter plot to represent the different machines. The problem of identifying objects with similar colors is solved by using a categorical color scale with colors as distinct as possible from one another.

The two datasets, presented in section 4.1, are used for the different representations. The first dataset, with primarily quantitative variables, is represented in a scatter plot and a histogram (section 2.3.4) while the second dataset, with primarily categorical variables, is demonstrated in a parallel sets representation (section 2.3.4). The scatter plot is chosen to give the user the possibility to explore the relationship between quantitative variables. As described in section 2.3.4, scatter plots can be used to examine whether there is any correlation between two attributes (for example distance from sea and number of unplanned inspections). The histogram is chosen to provide information about, for example, how often a machine is inspected in general and if there are any machines

(34)

Figure 4.2: The ViSITelligence application allows users to explore the data in order to find patterns in the data and relationship between variables. The representations are coordinated and sharing the same filters on data.

having inspections more or less often. The parallel sets representation is chosen to show the general flow of the data and possible relationships between categories. Furthermore, with its independent axes, as described in section 2.3.4, a large number of dimensions can be shown.

4.3.1

Scatter plot

In the scatter plot, figure 4.3, each dot represents a machine and the colors represent the possible categories for a selected dimension, which for example, can be the model of the turbine or the fuel type. The position of each dot corresponds to the measures of the selected variables for the axes using linear scales. Users have the opportunity to choose the variables used for the scales on the axes as well as the categorical dimension used to color the dots. An interactive legend is provided in the view to give the ability to filter out data points represented by a certain value upon selection. All dots are initially of the same size to limit the representation to three dimensions. Through the use of the options panel, figure 4.3, the user can select a fourth dimension to be displayed as the size of the dots. When using the fourth dimension, the user has the possibility to filter out data outside of a selected interval for the variable.

Some variables have no value associated to them for a certain dimension. Their visibility can be toggled with a control in the options panel. If visible, they are represented with a value of zero. Machines with no value associated to them for one or both axes are hidden by default for limiting misinterpretations of the values for the machines.

In addition to the interactive legends and the possibility to change the variables on the axes, tooltips are applied in the chart to give the ability of hovering dots with the mouse pointer and get detailed information about the selection. While hovering a dot, guidelines are drawn from the object to the axes to guide the user in interpreting the values for each axis. Zooming and panning functionality can also be applied, giving the

(35)

possibility to focus on a certain area of the graph.

Figure 4.3: The scatter plot is used to display the relationship and correlation between two variables for the machines. The user can filter by items in the legend and change the dimension for coloring the bubbles. The user may also change the variable used for the size of the bubbles, which in this case is Variable10. For Variable10 only machines with a value between 6 and 17 are shown. The variables representing the axes can also be changed.

4.3.2

Histogram

The histogram, figure 4.4, is used to show the distribution of the current data for a selected quantitative variable. This allows spotting outliers in the data and discovering which machines are similar to the average. Initially the histogram groups the data points into bins of equal size. By default, the number of bins is half the number of possible values for the variable used for the bins. Due to limitation in space the maximum number of bins that can be displayed is thirty. If more than thirty bins are used the tick labels for the x axis cannot easily be read. This issue is part of the future work discussed in section 6.1. The number of bins can be changed, using a slider provided in the options panel. This may result in differently sized intervals for the bins, if the number of possible values for the variable is not evenly divisible by the number of bins. Due to the discrete nature of the dataset used by the representation some intervals may include one value less or more than other intervals. However, the width of the bins are equal independent of the interval size. The variable used for the bins of the histogram can be altered from the options panel, figure 4.4. The vertical axis is in the current version always represented by the number of machines.

By hovering a bin it changes color to differ from all other bins. A tooltip with detailed information about the selection is also displayed. The tooltip includes information about the actual interval of values connected to the bin, the number of machines and the name of at most twenty machines within the interval of the bin. The rest (if more than twenty machines) are hidden represented by three dots (...).

(36)

Figure 4.4: The histogram shows the distribution of number of machines for a variable. In the options panel users have the opportunity to select the variable to be used for the distribution as well as the number of bins.

4.3.3

Parallel sets

The parallel sets representation, figure 4.5, allows users to analyze flows and patterns in the data. A wide flow further down in the chart implies that multiple observations have had the same combination of categories for the different dimensions. The ribbons in the parallel sets representation are curved by default to improve the traceability of each flow. With smoother changes in direction for curved than straight ribbons, the Gestalt law of continuity (section 2.1.2) is taken into consideration. However, straight ribbons make it easier to spot correlation between variables, why it is possible to toggle between straight and curved ribbons in the options panel.

The representation does, like the scatter plot and histogram, provide a tooltip showing detailed information when the user hovers with the mouse cursor over a ribbon. The information tells about the categories connected to the flow from the top to the dimension of selection. Moreover, it tells the amount and percentage of the aggregated number of maintenance findings of the total currently shown in the representation. Dimensions and categories within dimensions can be rearranged to display the flows in different perspectives.

The options panel, figure 4.5, uses a three column layout to be able to use most of the limited space. The first column has a checkbox for toggling the curvature of the ribbons and a button for resetting all of the applied filters. The other two columns consist of a checkbox for each dimension and an expandable menu to toggle the visibility of the controls for the categories connected to the dimension. These controls allow the user to remove uninteresting variables from all views by deselecting the checkboxes.

(37)

Figure 4.5: The parallel sets representation shows the flow of the maintenance findings divided in the different categories of each dimension. In addition to filtering the data by the different dimensions and categories, users can also rearrange the axes for the dimensions and the cate-gories to get a different perspective. A button for resetting the filters and a checkbox for toggling the curvature of the ribbons are also provided.

4.4

Coordinated representations

The three representations used by the application are coordinated meaning that filtering or selecting objects in one graph affects the view of all other representations. By selecting a dot in the scatter plot, it is highlighted by increasing its opacity while decreasing the opacity of all other dots. The bin in the histogram connected to the selected dot (with its corresponding value included in the interval) is highlighted by changing its color to differ in color from all other bins. In this way it is preattentively processed by the user when changing focus to the histogram representation. The connection between the

(38)

representations also works in the opposite direction. By selecting a bin in the histogram all machines connected to that bin are highlighted in the scatter plot. In both cases, the parallel sets representation is updated to display only data connected to the machine or machines included in the selection. Figure 4.6, shows the connection when a bin in the histogram has been selected.

Figure 4.6: The representations in the application are coordinated. By selecting a bin in the histogram, the machines related to that bin are highlighted in the scatter plot to be easily distin-guished and preattentively processed (section 2.1.1). The parallel sets representation is updated to only show the flow for the machines connected to the selected bin.

Filtering data in the parallel sets representation, by selecting or deselecting checkboxes in the options panel updates the data used by the scatter plot and histogram. The data shown includes only machines that have a connection to the combination of categories currently shown. In this way uninteresting data points are removed from all views allowing for detailed analysis of subsets of the data.

4.5

Use case scenario

Although the representations are connected during all time, they can be used individually to answer questions by interpreting the results. An employee wonder what remarks are most common to cause an engine failure or exceed the inspection criteria. Using ViSITelligence the employee can focus on the parallel sets representation and filter out uninteresting dimensions to only see the relationships for the attributes relevant for the task at hand. The data is explored and the user gains insight in the relationship between different remarks and an engine failure. The original question expands and additional dimensions are added to explore if a certain configuration setting or the exposure to salt have any impact on the failure. The focus of attention changes to the scatter plot and the histogram where the machines connected to the shown flows of the parallel sets are presented. Are they all close to sea or located on a similar altitude? Are they on average inspected within the same interval of hours in operation? These questions can

(39)

be answered by the scatter plot or the histogram.

A new question appears and the user resets all filters applied. Are any sites deviating strongly from the average of hours in operation between inspections? The user found some sites differing from the normal. Do those machines have something in common? The user selects the bin in the histogram and gains insight in their relationship according to the selected attributes in the scatter plot, and the flow of the data in the parallel sets. The knowledge, about the lifetime of the machines, gained by the user is communicated to other parts of the company for improving the components and the configuration of machines in order to increase the lifetime.

References

Related documents

For integration purposes, a data collection and distribution system based on the concept of cloud computing is proposed to collect data or information pertaining

Part of R&D project “Infrastructure in 3D” in cooperation between Innovation Norway, Trafikverket and

In this survey we have asked the employees to assess themselves regarding their own perception about their own ability to perform their daily tasks according to the

konkurrensreglerna/--ovrigt--/varfor-konkurrens/ (hämtad 2020-03-11). 20 Bernitz, Ulf & Kjellgren, Anders, Europarättens grunder, 6 uppl.. 1 § KL är avtal mellan företag

14 Effect of the movement speed in resistance training exercises on sprint and strength performance in concurrently training elite junior sprinters, Blazevich AJ, Jenkins DG, J

The tool includes different kinds of plots and filters that make the process of selecting sub-sets out of large data sets easier. The program supports zooming and translation of

By using a Kalman filter with an included error model one hopefully can get a good estimation of both the vertical velocity v d,ref and the measurement error e vel.. The estimation

Analysis settings tool should provide a method to display a given set of files and enable changing such parameters as: scaling, level cross reference value, level cross levels