Web-based interface for data visualization

(1)

DEGREE PROJECT IN TECHNOLOGY FIRST CYCLE, 15 CREDITS

STOCKHOLM, SWEDEN 2020

Web-based interface for

data visualization

PANTEA TAVASSOLI

KTH

SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

Degree Programme in Computer Engineering Date: June 2, 2020

Supervisor: Anders Sjögren Examiner: Fadil Galjic

School of Electrical Engineering and Computer Science Thesis constituent: Appva AB

(3)

Abstract

In the age of Big data and exponential digitalization, data visualization is becoming a ubiquitous tool to understand trends, patterns and identify deviations that better help in decision making. The purpose of this thesis is to explore how a scalable data visualization interface can be designed with the open-source web library D3.js. The interface is designed to display a range of patients’ physiological measurements to help healthcare professionals with Covid-19 diagnosis. Several prerequisites were identified through a qualitative study, which proved to alleviate the implementation process, such as choosing a robust model that can support visualizations despite discontinuous and incomplete datasets. Since faulty visualizations may lead to potential harm in the highly sensitive medical setting, a dedicated risk analysis was deemed beneficial and thus formulated. The design of the interface also revealed functionality that could be considered when implementing any visualization interface, such as the rendering of different views and features that can further assist the user in interpreting the visualizations.

(4)

Sammanfattning

I en tid med Big Data och en exponentiellt växande digitalisering, blir datavisualisering ett mer förekommande verktyg för att förstå trender, mönster och identifiera avvikelser för att underlätta beslutsfattande. Syftet med studien är att utforska hur ett skalbart datavisualiseringsgränssnitt kan utformas med hjälp av det webbaserade biblioteket D3.js. Gränssnittet är utformat för att visa ett omfång av patienters fysiologiska mätvärden med syftet att hjälpa sjukvårdspersonal med diagnostiken av Covid-19. Flera förutsättningar kunde upptäckas med hjälp av en kvalitativ förstudie. Denna studie visade sig underlätta implementeringsprocessen, där bland annat en robust modell som stödjer visualiseringar trots diskontinuerliga och ofullständiga dataserier identifierades. Eftersom felaktiga, eller delvis fungerande visualiseringar kan leda till potentiell skada i den mycket känsliga medicinska miljön, ansågs en riskanalys vara fördelaktig. Därför utformades en sådan analys, som dessutom visade sig sedan kunna vara användbar i flera sammanhang. Gränssnittets design visade också på gemensam funktionalitet som kan övervägas vid implementeringen av andra visualiseringsgränssnitt, bland annat hur vyer renderas men även funktioner som vägleder användaren till att lättare kunna tolka de olika visualiseringarna.

(5)

Acknowledgments

A special thanks to my advisors at KTH, Royal Institute of Technology, Fadil Galjic and Anders Sjögren, for providing with valuable insights and feedback throughout the degree project.

Also, warm regards to my supervisors and project owner at Appva AB, in Gothenburg. Thank you for guiding me while allowing me to realize this project. Together we will help reforming the health and care sector through digitalization, one step at a time.

(6)

(7)

(8)

(9)

Abbreviations

MCSS Medical Care Support System

CV Coefficient of Variation

RSD Relative Standard Deviation DDDM Data Driven Decision Making HTML Hyper Text Markup Language CSS Cascading Style Sheets

SVG Scalable Vector Graphics DOM Document Object Model

JS JavaScript

D3 Data Driven Documents

CSV Comma-separated values JSON JavaScript Object Notation WHO World Health Organization

(10)

(11)

1

1 Introduction

Data visualization [1] is the graphical representation of data and information. Using visual models such as charts, graphs and diagrams makes it easier to see and understand patterns, trends and deviations in big amounts of data. Arguably, we are in the age of Big Data [2], where visualization becomes an increasingly central aspect of managing vast amounts of data. In an attempt to search for a specific value or deviations, the viewer would be required to search from top to down. Perhaps, this could become tiresome and instead, by the help of data visualization, the viewer is provided with a more easily digestible view. Our eyes are easier drawn to colors and patterns, rather than rows of data and this could be used as an advantage when presenting data. The great amount of data generated by industries is growing exponentially. It is therefore becoming increasingly more important to refine it into interactive visual objects, in order to take actionable decisions on an executive level. The field of data visualization is, thus, becoming a corner stone in the age of data driven decision making.

1.1 Background

This thesis will focus on the data driven decision making aspects of visualizing data. Can a visual model simplify users’ needs to go through a big amount of data, and how can it assist the viewer to make decisions? The thesis will be exploring different aspects of handling data and discover possibilities to present it by using a visualization library.

1.2 Problem

There are several tools for data visualization and data analysis, ranging from intuitive to more complex. It might be difficult for an engineer to know which tool to choose. Also, for the majority, it might seem unnecessary to go through a steep learning curve just to create visual models for data that is already available. Many companies or curious individuals might want a pre-made interface with documented models, that is not only easily accessible, but also scalable and appropriate for several purposes. Hence, the research question investigated in this thesis is:

RQ. How can a web-based interface for data visualization be created?

1.3 Purpose

(12)

2

1.4 Goal

The main goal of the thesis is to provide other developers inspiration to create or expand a visual interface for data, by providing and documenting an applicable web interface that handles data visualization. The interface may be used as a web template, with graphs that are easily reused for different purposes.

To achieve a working template, a data flow needs to be mocked or extracted through data export with regards to confidentiality. Evaluating and learning a chosen visualization library is expected, along with observations on how data presentation is best approached through different literature. Hence, a study along with development, will fulfill the goals to create and document a data visualization interface which is appealing, user-friendly and scalable.

1.5 Sustainability and Ethics

The stakeholder of the thesis project is the constituent Appva AB [3], who will sustain the interface by integrating it into future projects. Other engineers may also use this thesis to find inspiration or help to develop their own web-based interface for data visualization purposes.

In terms of ethics, the extracted data that the visualizations will be based upon, need to follow integrity policies related to user data. [4]

1.6 Methodology/Methods

Firstly, a literature study was done with the aim of identifying important prerequisites when visualizing data on the web. In the study, web libraries were explored, types of data evaluated, and data presentation studied. The study was approached qualitatively, where relevant information was acquired by observations and by viewing different documentation platforms. Each literature source or tool was reviewed in order to see if it was relevant for the thesis.

Then, a case study, developing the interface begun, with the opportunity to use acquired theories and information found in the literature study. During this process, iterative goals through the project method were set in order to produce a result that could then be extended with other models.

(13)

3

1.7 Constituent

The constituent for this thesis is the company Appva AB. The company provides the service Medication and Care Support System, MCSS [3] handling digital signatures [5] for interventions in the health and social care, in order to eliminate unnecessary paperwork that many healthcare providers have to deal with on a daily basis.

In the midst of the Covid-19 pandemic, the company added support for measurements related to the disease in MCSS, to further expand the support. The service generates big amounts of data, and the company wants to present it in a helpful and readable way.

1.8 Delimitations

The focus for this study will be on data presentation of data with the development of a web-based data visualization interface. Data models will be evaluated but only one model will be implemented in order to provide a more consequent interface and to avoid several, but incomplete, models.

1.9 Outline

The thesis is structured as followed:

• Chapter 2 presents theoretical background necessary to follow the rest of the thesis.

• Chapter 3 describes the research approach and methods used in the thesis.

• Chapter 4 presents the prerequisites for developing a web interface and results acquired from the literature study.

• Chapter 5 presents the results from a case study, which is the design and implementation of the web interface.

• Chapter 6 summarizes the results chapters of the thesis and answers the research questions.

(14)

(15)

5

2 Theoretical Background

This chapter describes the theoretical background of the degree project and related work. The chapter first begins by introducing the concept of data presentation along with the importance of choosing a suitable model for the purpose. This is then followed by an introduction to what Data Driven Decision Making, DDDM is, and why it is a thought process that should be considered when working with data visualizations. The next section overviews core concepts of how to handle a web page relevant to understanding the interface, followed by an introduction to Medication and Care Support System, MCSS that the measurements belong to and that the interface will be integrated to. The chapter then ends with a section overviewing related work that has been studied.

2.1 Data Presentation

Data visualization is a presentation of data in a graphical format, generated by a software. This visual representation of data provides the user with a perceptive view in order to study information by observing patterns, correlations and causalities. In addition, data visualization can facilitate decision making backed up by data, rather than intuitive observations alone. [1] This type of decision making has, along with other benefits of visualized data, in recent years become a very fundamental tool for industries to get guided from a business perspective.

It is important that every type of data presentation should be self-explanatory, with the purpose of avoiding reading any text that the presentation refers to. Only labels or explanatory text related to the models needs to be provided. [6] 2.1.1 Variables

An essential stage in presenting data, is to understand how to classify different types of variables and how they should be presented in tables or graphs. [6] Variables can be divided into two main groups: qualitative or quantitative. Qualitative variables are categorical, meaning that they can be divided into different categories. Examples of qualitative variables are blood groups (several) or the presence of a pathogen in a body (yes or no). Quantitative variables are numerical and often richer in information hence preferred for statistical purposes. Furthermore, quantitative variables can be divided into subgroups that are continuous or discrete. Discrete variables are observations that take certain numerical values such as a person’s age in whole years or the number of coin flips. [7]

2.1.2 Presenting qualitative variables

(16)

6 sufficient, although for the sake of visualization, a more vivid graph or chart can be used. Examples of suitable graphs and charts can be a bar graph or pie chart, where each category can be distinguished by the usage of a dedicated color shade along with a matching label for easier identification. [9]

2.1.3 Presenting quantitative variables

Discrete data can, similarly to qualitative data, be put into separate groups with the difference of using discrete values instead of category names. In the case of bar graphs, discrete data needs a histogram rather than a bar graph, where a histogram have no horizontal spacing between numerical values along the horizontal axis.

Continuous data is infinite, hence putting it in categories will not be possible. Instead, ranges of data can be introduced with lower and upper limits, where each range width is the difference in between. A range can also be open ended, where there is no lower or upper limit. An example of this is when introducing a range such as “bigger than” or “lower than”. These ranges of data should not overlap, since grouping too much data together can lead to loss of details. Examples of continuous data presentation is with line graphs and plots. In addition, an important property of quantitative data is its distribution, showing how data values are spread out, allowing us to draw statistical conclusions. [10]

2.1.4 Coefficient of variation

In statistics, the Coefficient of Variation, CV (also called Relative Standard Deviation, RSD), represents the ratio of the standard deviation to the mean value, measuring spreading. It is useful since the value of CV is independent of the unit, often called dimensionless. It is instead measured in percentage, unlike standard deviation which has to be understood in a specific data context.

The definition follows: •

CV =

!""#

$%&', where 𝜎 is the standard deviation.

(17)

7

2.2 Models

When presenting data, several models should be considered. With the right model, visualizations can have the capabilities to summarize large amounts of data efficiently in a graphical format. When deciding on the model type, there are common aspects to consider such as if change over time, data distribution or relationships are prioritized. [12]

2.2.1 Scatter plots

Scatter plots or graphs are used to observe relationships between, commonly two, variables. A dot positioned at a horizontal and vertical axis indicates the values for two numerical variables and are coded after colour, shape and size. A scatter plot is not only a standard when showing the relationship between two values but is also a definite choice of model when data is not continuous. Since it has dots, see Figure 1, rather than a connected line, values can be discrete or spread without the dependence of a continuous representation.

Figure 1 A prototyped scatter plot made in D3.js

A scatter plot can easily suggest various kinds of correlations between variables, ranging from positive to negative, linear to non-linear, weak to strong, depending on the slope of the plotted values. Another useful aspect is the identification of patterns in data since the data points can be grouped into clusters, depending on how relatively close the dots are to each other.

(18)

8

2.3 Data Driven Decision making

The purpose of this section is to provide an insight of what Data Driven Decision Making, DDDM, is and how it can be used to utilize usage of gathered data. The concept of data science is introduced as a prerequisite to understand the process of DDDM, since it can be seen as a central aspect within the data science field.

Commonly, the DDDM is a process that ought to be in the back of the head of anyone who works with data visualizations. [13]

2.3.1 Data science

Data science involves many principles, originating from different fields of studies. Collection of data must be sorted out through various ways, and a fundamental understanding within the fields of statistics is at many times required. Also, methods and tools for visualization must be acquired in order to present extracted and useful knowledge from data. [14]

An example of a topic that needs to be considered within data science is causal analysis [15], also called root-cause analysis, which is referral to a reason such as focusing on finding the root of a problem, rather than finding the symptoms, which is at many times conducted as a role of data scientist. Generally, the analysis involves causal discovery that allows further understanding of the factors that impacts the outcomes of the data that has been analyzed, rather than depending on a correlation analysis alone.

2.3.2 The process

In the current data-oriented business environment, the purpose of data science can be further described as the goal of improving decision making. In particular, DDDM refers to the practice of taking decisions based on analysis of data, rather than pure intuition or observation alone. Generally, DDDM endorses a more business-related process of data science, where anyone working with analyzing data, will have to tackle business problems from a data perspective. The process can generally be described as backing the company’s decision with the help of data analytics, such as utilizing observed patterns in data and to further develop strategies that benefits the business. [16]

The DDDM process typically utilizes two types of data, qualitative or quantitative, which has been mentioned previously in regard to data presentation. What can be achieved from the two types of data may differ, but generally there are set pointers that could be considered when conducting a DDDM strategy [17, Ch. 2], despite the approach of data being qualitative or quantitative.

Deconstruction of a question into a set of researchable steps

(19)

9 • What type of data is needed?

• What strategies are required? Is root cause analysis applicable?

• Can the question be answered? Is the question worth being answered, or is the required resources and cost of answering the question higher than the worth of the anticipated outcome?

The sub-questions should be a foundation to build upon and used as a first step of the process of DDDM. [17, Ch. 2]

Rational Decision Making

When important questions are identified, next pointer is relevant to rational decisions. With growing data volumes and higher data variety, the need for informed decisions based on evidence becomes more relevant. [18] The gathering of evidence to inform a reasoned choice among other relevant alternatives can summarize for whether an approach or a reasoning is true or false.

Evidence is not always conclusive, so it may be supplemented by reasonable assumptions. These assumptions can be biased and based on conclusions that are reluctant, since most of our mental thought process is made instinctively. [19] Logical flaws can easily occur, and sometimes even wishful thinking may be incorporated, into an otherwise rational thought process. By identifying that bias exists and addressing it, can limit the impact, and with combination of more evidence and collaboration, bias may be further eliminated. [18] Data Gathering

One of the sub-questions identified in the process is the ability to manipulate data between programs and tools used for data presentation, such as in and out from databases to spreadsheet programs and libraries used for data visualization. The data needs to be exported accordingly, to be able to be ready for analysis and presentation, hence a knowledge of how data works may be sufficient. [17, Ch. 2]

(20)

10

2.4 Handling a webpage

Since the interface will be web-based, an understanding of how a web page is built is important. This section introduces concepts, semantics and libraries needed to create the web-based interface.

2.4.1 HTML

Hyper Text Markup Language (HTML) consists of a series of elements, that structures and displays plaintext content of a web page. Each element is represented by an opening and closing tag that labels pieces of contents using semantics. [20] Furthermore, each tag can also be extended with attributes affecting how a browser will interpret the element. By using a .html extension, for a file, a web browser will automatically render the content. [21]

2.4.2 CSS

While HTML structures content displayed on web pages, Cascading Style Sheets (CSS) is used to style the plain text content. In addition, it can also affect the choice of rendering, such as on screen, in speech or any other media. Just like HTML, CSS is one of the core languages of the web and hence standardized. [22]

2.4.3 SVG

Scalable Vector Graphics (SVG) is a language for describing graphics, just like HTML describes text. It is used to define vector-based graphics and every element and attribute in an SVG-file can be manipulated into animations. The advantage of SVG images, compared to other image formats, is that the image can be rendered to fit a particular size without the disadvantage of losing quality. [23] Just like HTML and CSS, SVG is a standard, maintained by World Wide Web Consortium (W3C). [24]

2.4.4 DOM

A web page can be seen as a document and in order to manipulate the structure, style and content, the document can be represented as a Document Object Model (DOM). [25] The DOM is a logical tree-like, programming interface with nodes and objects, allowing any programming language, but traditionally scripting languages such as JavaScript, to modify the content. Furthermore, the DOM allows programmers to access and modify any HTML-element found for easier managing of the web page structure.

(21)

11

Figure 2 An example table of a DOM representation. 2.4.5 JavaScript

While a webpage’s content is defined by HTML and style specified by CSS, its behavior can be programmed with JavaScript. All modern web browsers have a JavaScript engine and the language is an essential part of web applications, enabling a more interactive and dynamic side to the web. [27] JavaScript is one of the most commonly used languages in the world, and with the web browser hosting the scripts, developers can validate data, transmit information or traverse through the DOM using JavaScript libraries, such as the query language jQuery. [28]

2.4.6 D3.js library

Data-Driven Documents (D3.js) is a JavaScript library used for the purpose of producing data visualizations in web browsers. The library utilizes web standards such as rendering with HTML, CSS, SVG and DOM-manipulation, offering more expressive visualizations and improved performance.

For more information about D3.js, see [29]. 2.4.7 CSV

D3.js provides several ways to access data. One approach is to use the built-in support for parsing Comma Separated Values, called CSV-files. This is a format widely used with spreadsheet programs such as Microsoft Excel and is space efficient, while improving loading times for large datasets. [30]

2.4.8 JSON

(22)

12

2.5 Medication and Care Support System

In 2012, Appva launched the digital service Medication and Care Support System, MCSS [3] which is the first of its kind in Sweden. It is used within geriatric care, home care and other relevant types of care that has previously required paper signatures, see figure 3, that has led to unnecessary and time-consuming paperwork. A care assistant is encouraged, with the help of MCSS, to sign medical activities for a patient through a supported device, such as a phone or iPad, see figure 4. Resources that are typically time consuming, becomes more efficiently distributed, and more time can be spent on the patient itself.

The system is continuously being developed with new add-ons to further address user needs within the different health care systems. Examples of features that are supported by the system can be whether patients have received the interventions they require, or registration of drugs and medical measurements for individual patients. Furthermore, MCSS has also proved to reduce medication deviations with more than 90%. [3]

(23)

13

Figure 4 An example of a digital signature list of delegations on a digital device. The example is taken from the home website of Appva.

2.5.1 Covid-19 measurements

Prior to the thesis, a Covid-19 pandemic broke out and affected the world and the setting of the thesis. In the midst of the breakout, MCSS offered support for dedicated measurements related to the virus infections to provide further digital support for the health care. The measurements should be introduced since they constitute the majority of data used or mocked for the visualizations. [32]

Body temperature

(24)

14 Respiratory frequency

Another symptom that patients can develop with a COVID-19 infection is shortness of breath. [33] Due to difficulties in breathing, more breaths may be taken in order to compensate for lack the of oxygen. A patient’s respiratory rate is the number of breaths taken per minute and is an important measurement to take along with oxygen saturation. Normal respiration rates for a resting adult range between 12 to 16 breaths per minute and the amount may increase with fever or shortness of breath. [34]

Oxygen saturation

The human body requires a balance of oxygen in the blood to function properly. Usually this is a range within 95 – 100% and a value under 80% may compromise organ function. [35] A pulse oximetry [36] can be used to measure the percentage of oxygen in the blood and usually the device will be clipped to a finger or the earlobe. Since long term difficulties breathing may lead to lack of oxygen and is a common symptom in severe Covid-19 infections, measuring oxygen saturation is highly relevant.

2.6 Related works

Prior to the thesis, two related works have been researched more thoroughly as a part of the literature study. The first one is a paper produced by Massoomi and Handberg [37], where the role of smart devices within modern medicine is discussed. The second work is Apple’s iOS health app which is available on almost all of Apple’s smart products such as the iPhone, iPad or the Apple Watch. [38]

The iOS Health Application is one of the applications mentioned in the paper written by Massoomi and Handberg and is further regarded as a source of inspiration for the thesis, due to the visualizations of medical measurements. 2.6.1 Massoomi and Handberg

Since the visualizations will be based on medical measurements, the paper written by Massomi and Handberg is relevant as a general background to how visualized metrics can be helpful in the healthcare as a part of the new era of technology. The report explores possibilities with data visualization on smartphones and other devices available for users, and how the validity of these visualizations is evaluated. For example, the same data can be perceived differently depending on the choice of visualization application, with error margins such as overestimations of faulty recommendations.

(25)

15 2.6.2 iOS health application

The iOS health application, in particular the functionality of measuring the heart rate is interesting from a data visualization perspective. Data that has been gathered, in this case a person’s heart rate, becomes visualized in the application by entering the data either manually in the application or automatically through several wireless blood pressure monitors.

Mainly, the interesting property considered with the application is how the visualizations of the data has been designed and presented. Since heart rate is a medical measurement, it is extra relevant for the thesis, but also from a statistical point of view, it is also notable to evaluate the applied model based on the form of data. In this case, a type of scatter plot has been implemented, see Figure 5, mainly due to the nature of the data which can be evaluated, but perhaps also because of the lack of ensuring that the data is continuous.

On a more user experience themed approach, the design of the visualizations is also interesting. For example, general priorities in what information that is being featured depending on the chosen time interval or the style of axes.

(26)

(27)

17

3 Method

The chapter describes the research strategy and methodologies applied in the degree project in order to answer the research question. The first section overviews the choice of methodologies implemented in regard to the project. This is followed by a section that presents identified sub-questions, with the purpose of identifying important aspects within the thesis. The last section describes the applied research strategy.

3.1 Research methodologies

This section gives a brief introduction to the research methodologies applied in the project, along with why certain approaches were chosen.

3.1.1 Quantitative and qualitative methods

Quantitative research methods [39] focus on numerical results, examining relationships between numerically measured variables. The data will be analyzed with the help of statistics and the methods are often based on random sampling and collection instruments. Due to its unbiased approach of data and results, findings of quantitative studies may be easier to summarize and compare.

Meanwhile, qualitative research methods [40] is focused more on textual data, such as interviews, observations and focus groups. Findings of qualitative studies may provide rich data and a more in-depth picture of the context but could lack generalizability by being too reliant of subjective interpretations. 3.1.2 Inductive and deductive methods

An inductive research method (inductive reasoning) [41] follows a bottom-up approach, where empirical data is collected before developing a hypothesis. The collected data will be studied, with the intention to create a theory that may explain observed patterns found in the study.

On the contrary, a deductive research method (deductive reasoning) [42] follows a top-down approach, where a hypothesis is based on already available theories. The intent is instead rather focused on testing and verifying the existing theory.

3.1.3 Case studies

(28)

18 3.1.4 Bunge’s method

Bunge’s method [44] focuses on research being based on intuition and common sense, created by lesser qualified guesses. This easily leads to a “trial and error”-method until satisfactory results are found under defined conditions. Andersson and Ekholm offers a generalized version of Bunge’s method [45], adapted for technological research. The generalization consists of the following steps:

1. How can the current problem be solved?

2. How can a technique/product be developed to solve the problem in an efficient way?

3. What base/information is available and required to develop the technique/product?

4. Develop the technique/product based on the information gathered in

step 3. If the technique/product is satisfactory, go to step 6.

5. Attempt to create a new technique/product.

6. Create a model/simulation of the suggested technique/product. 7. What are the consequences from the model/simulation in step 6?

8. Test the implementation of the model/simulation. If the outcome is dissatisfactory, go to step 9, otherwise go to step 10.

9. Identify and correct flaws with the model/simulation.

10. Evaluate the result in relation to existing knowledge/praxis and in addition, identify new problem areas for further research.

3.2 Sub-questions of the research question

The research question stated in the thesis is: “How can a web-based interface for data visualization be created?” The question can at first seem general and in an attempt to narrow down and further identify topics that should be considered, the main research question can be divided into sub-questions. The following paragraph presents the identified sub-questions:

(29)

19

3.3 Adapted research strategy

The method used in the thesis project can be described in different stages, see Figure 5. The first stage consisted of the problem statement, or the research question that thesis tried to solve. The problem statement led to the next stage, which was the literature study, with different preparations such as researching data visualization as a topic, finding relevant libraries and documentations and reviewing related works. After the literature study, a case study began with the intention of designing and implementing a web-based data visualization interface. When a viable version of the interface was created, an evaluation of the interface was made for a possible integration. Then the results were summarized, research questions answered along with an evaluation of the results with discussions.

Figure 5 A brief overview of the different stages of the thesis project.

In the literature study, a qualitative approach was primary taken. In the choice of data visualization library, the approach was based on observations of different documentations and whether the library is easily compatible with already set factors such as the developers own skills or future integration into relevant systems. In the case of which data that needs to be presented, the approach can also be seen as qualitative since these were already set goals of the company, wanting to provide visualizations of measurements that can help specific groups of people, based on an observed potential need.

The research did not have a predefined hypothesis waiting to be proved. Instead, conclusions were drawn based on the gathered experience during the research in order to answer relevant research questions. Hence, an inductive approach was more appropriate.

(30)

20 In addition, the development of the interface within the case study has been approached in an agile way, or in iterations. Fortunately, Andersson and Ekholm’s generalization of Bunge’s method can be applied when developing a technological product within iterations, see Figure 6.

Figure 6 The research method based on Bunge’s method that was followed when developing in the project.

3.3.1 Understanding

To follow the first step of Bunge’s generalized method, “How can the current problem be solved?”, requires an understanding of the problem. To receive a better understanding of the area of the problem and to discover relevant possibilities or clues, a brief literature study around the research questions was made in order to identify relevant topics. The research questions can be seen as the result of creating sub-questions of an initially broad thesis question, and by extracting it into smaller, identified questions, it was easier to understand and tackle the initial problem.

3.3.2 Researching

(31)

21 reviewed so that next stage, the development, would be as efficient as possible, with the attempt of avoiding difficulties such as not knowing where to start or which tools to utilize. Even though an initial research was conducted, these steps were frequently revisited at each new iteration of the developed interface or when issues occurred.

3.3.3 Developing and testing

When the initial literature study was done, the different methods and gathered information from the study was approached when starting the case study, which was to design and implement the interface.

On a daily basis, “tickets” according to agile project methods, or smaller issue, were created to identify required steps and to keep track of the progress. The issues were in addition assigned with a level of effort, as a measurement of difficulty, in an attempt to easier identify more time-consuming issues or features, so that it would be easier to plan the workload of the iteration. Since the degree project was time restricted, prisonizations in implementations were done. Rather than implementing many, but partial models, a more in-depth approach to implementing one working model was focused on.

Furthermore, each feature had three phases: 1. Finding resources, if needed.

2. Implement the required functionality to solve the issue. 3. Test and debug to ensure the functionality acts as expected.

This step of the research method was very extensive, but briefly followed step 4 – 9 of Bunge’s method.

3.3.4 Evaluation

The last step of the adapted research method was based on the last step of Bunge’s method “Evaluate the result in relation to existing knowledge/praxis and in addition, identify new problem areas for further research”. This step was to ensure whether the research questions could be answered during the project, in particular after conducting a dedicated literature study and a thorough case study. Furthermore, this step also evaluated the developed interface and how it can be improved upon in the future.

Another considered aspect in the evaluation, was the work method in the degree project. Mainly to identify, whether the chosen work process was successful or if another work method could be suggested.

3.4 Development tools

This section overviews the tools such as the development environment, testing and project model used in the project.

3.4.1 Development environment

(32)

22 [47], with relevant plug-ins along with MAMP [48], a local server environment used to view the web-based interface.

Before real data could be connected to the visualizations, mocked data was required. Excel and online random generators, such as Mockaroo [49], were used to mock data. GitHub was frequently used for several purposes. One purpose was for the sake of version control of the code and to give other developers transparency on how the project has been implemented. Another purpose was to host mocked data used for the visualizations through an URL and to finally, integrate the visualizations into the constituent’s system as a part of a release system. [50]

3.4.2 Testing and debugging tools

An external testing tool was not required when developing the interface. Instead, the web browser’s developer’s tool with the console was sufficient in order to inspect the DOM. Debugging was mainly done in the web browser and the choice of web browser varied for the sake of testing different environments where the interface would be shown, since web browsers tend render content differently.

3.4.3 Project model

The thesis project was performed within the company’s agile project model, with iterations, “use cases” and “tickets”. Along with the agile project method, the MoSCow method [51] was considered with set priorities, in order to identify requirements within the iterations and throughout the degree project so that the a final result would be achieved. If the opportunity occurred, further implementations and research would be relevant through additional iterations:

M (Must have):

• Answering the research questions. S (Should have):

• A web-based interface with an appropriate model, used to create views with data presentation using a JS library.

C (Could have):

• A fully functional product that can be integrated into future projects. W (Will not have):

(33)

23

4 Data Visualization Interface: Prerequisites

In addition to knowing how to handle a web page and to control web content, other requirements need to be fulfilled before designing a visual interface. When identifying these prerequisites, the DDDM process can be followed and this section presents the documented preparation stage before the creation of the web interface.

4.1 Data gathering

A very important prerequisite, that is also mentioned in a DDDM-process approach, is the data gathering. When the type of data is decided upon, in this case the Covid-19 measurements, the next step would be how the data will be gathered. The data is stored in a database handled by the constituent and needs to be exported in a format suitable for visualizing.

Different data formats were tested, mainly the CVS- and JSON-format. However, the constituent follows the Fast Healthcare Interoperability Resources, FHIR specification for JSON-formats when exporting data, which is an international standard for health care data exchange. [52]

4.1.1 Mocked data

Even if real user data is already stored and, commonly in a database, it is not certain that the data is in a state where it can be visualized. The data needs to be exported into a suitable format, as well as extracted through security policies if there are user identifying parameters involved. This is rather time consuming and to kickstart the visualizations, fake (also called mocked) data can provide a ground to base the visualizations on.

There are several aspects that should be considered when creating the mocked data. At what stage of the prototyping will the data be used? Is it used for the purpose of gathering information related to the visualizations, such as the domain of the graphs or which model that is suitable? Or is it mainly to have a data flow to test the model on. Depending on the answer of that question, the outcome of the mocked data can differ.

Random data

If the mocked data is required for the only purpose of having a dataflow with matching parameters, random data generating tools can be used. Mockaroo [49] was used to generate a dataflow quickly. The generator offers plenty of data types and limits compatible with most data bases. Furthermore, it offers formats such as SQL, Excel, JSON and CSV, with the two latter mentioned compatible with D3.js.

“Realistic” data

(34)

24 variation, CV, can be used to approximate the spread that can occur in the real data.

Low spread

With a lower CV of 1.5%, expectations of the values being spread is limited. Figure 7 may imply that the domain of the y – axis may need to change in order to show a more proportioned result domain.

Figure 7 A scatter plot made in Excel with values plotted with a CV of 1.5 %. High spread

With a higher CV of 4.6%, the expectations of the values being spread more is considered, see Figure 8.

(35)

25 Unfortunately, Mockaroo or any other similar generator does not provide this variation of data, since the concept is based on true random generating. Instead, Excel was used to generate data with the different variations, see Figure 7 and 8. The advantage of using a dataflow with a set CV, is that it can in an early stage of prototyping, assist with relevant information and actions based on data that is more convincing rather than data that been generated randomly.

4.2 Choice of a model

When the data is collected, different models needs to be considered and the purpose of data visualization can be repeated. The purpose is to show change over time as well as observing how data is distributed, to assist a client in interpreting a trend or notice deviations that may be difficult to distinguish.

When the purpose of the data presentation is settled, the nature of the data must be considered. The Covid-19 measurements mainly consist of two types of variables, one numerical value, another one being a time stamp of when the measurement was registered. Both of these values are considered quantitative. Furthermore, they are both non-continuous since the service does not ensure continuous data inputs. Upon deciding on the suitable model, an attempt on eliminating any bias should be considered, mainly through evidential research and discussions with others. When a conclusion has been reached, the process can move on. The most popular model was the scatter plot, mainly because of its flexibility towards the type of data, it does not need to be continuous, but also since it is easy to see individual data points in order to easier draw conclusions.

4.3 Choice of a visualization tool

Considering the fact that the visualization interface should be available on the web, and that it can communicate with a user accessing it, the choice of visualization library should be compatible with web standards and common markup. That narrows down the choices, but plenty of visualizations libraries are still available and research took place in order to evaluate the different libraries, in order to find a suitable candidate. To summarize the guidelines taken when finding the right library, following aspects were considered:

• Which programming languages are compatible with the library? • Does the library provide enough documentation so that the

learning curve would not be overwhelming?

(36)

26 Plotly

The most popular aspect of Plotly is according to stackshare.io, that Plotly allows bindings to other popular languages such as Python or MATLAB. [53] For developers not so acknowledged with JavaScript, Plotly can be a confident choice. However, due to its relatively new entrance in 2013 [54], the documentation of Plotly is relatively limited, as well as it is built on D3, which means that it will not allow functionality that D3 cannot already offer.

D3

D3 emphasis on web standards and it has a broad approval, with many developers using it on popular sites such as GitHub. [53] It has a complex syntax, similar to jQuery, but since it has a lot of available resources such as literature and documentations, it is not as difficult to receive the help needed to work with the library. In conclusion, if JavaScript is a comfortable language to work with, D3 may be a more resourceful choice.

4.4 Risk analysis

This section summarizes identified risks posed by different visualization approaches or techniques. The primary risk was to render the data inconclusively, confusing the viewer into faulty assumptions. Especially since the visualizations were based on medical measurements, interpreted by clients within the medical sector, the risks are far from negligible and should be carefully considered. Following risks were identified:

Color coded deviations and outlier values

An idea to further assist the viewer, is to apply color coding as an indication of leaving normal tendencies. However, the choice of color can resonate differently with users and risking not being in compliance with other applied colors and signaling the wrong message to the viewer. In addition, a personal bias interpretation of color coding is likely to occur, and an extensive user research would be required.

Adding a trendline in the scatter plot

As mentioned, the data is not ensured to be continuous, hence adding a trendline connecting the data points could be interpreted as continuous, which is faulty.

Presenting data as mean or median values

(37)

27

5 Data Visualization Interface: Design

This chapter presents the overall design and functional requirements that have been implemented in the interface. The chapter also presents how the interface has been integrated into the MCSS-environment and a screenshot of the interface can be overviewed in Figure 9.

Features will not be described in detail and code review in this chapter will be avoided in order to keep a concise format. For further information about the code and the implementation, feel free to see the appendices.

Figure 9 An overview of the interface for the measurement oxygen saturation in 5-weeks view.

5.1 Web- and modular design

(38)

28 Furthermore, a modular design of the code of well-known principles such as low coupling and high cohesion is approached. Even though the code lacks classes, a modular design where the code is lowly coupled is important. A modular system is a type of architecture where functionality can be added or replaced without affecting the rest of the system. Since the interface does not consist of physical components and is rather integrated in a big script, a true modular architecture is difficult to reach. However, the approach can still be taken in order to eliminate a repetitive need to change code at several places when changes are introduced. To avoid this repetitive behavior, the code is factorized into a main rendering function. Upon calling this function with dedicated arguments relevant to the different views, unnecessary code is eliminated, and a more modular design is approached.

5.2 Functional requirements

One of the primary purposes of the interface is to be scalable and limit the effort of changes needed when having other types of measurements. For example, it should be unproblematic to visualize additional measurements other than the Covid-19 measurements. Furthermore, it should be able to change a time view, allowing the user of the interface to interact with the data visualization. Hence, identified requirements were found:

• View rendering • Adaptable axes • Scatter plots

• Static and dynamic toggling • Snap grid hovering and tooltips • Responsiveness

To understand how the functional requirements have been implemented, a general understanding of how D3.js functions is sufficient. It is mentioned briefly in the theory section with referred sources for further reading, but the principle of D3.js is briefly explained by the approach of appending sub-groups, or children, to the parent SVG, building on the DOM.

5.2.1 View rendering and adaptable axes

Since the interface should assist the viewer with data visualizations, the viewer should be able to interact with it by toggling between desired time intervals to get a better overview of the specific time interval. There are three main views available for the data visualizations, and they are reached with button clicks presented in Figure 10.

(39)

29 The views are sorted after following requirements:

• A 5-weeks allowing a longer time frame for the viewer to access history and search for deviations occurring over a longer time span. • A week view offers an overview of the current condition.

• A day view offers a more precise view of the current day.

Furthermore, the interface has two axes, a vertical y-axis with values and a horizontal axis with dates and a date label summarizing the rendered time interval that the user is presented with.

The vertical axis is static and depends on the visualized measurement, which is often a numerical value. The horizontal axis is not static and will visually differ to correspond to the requested time view. This means that the horizontal axis will visually change the appearance to three formats depending on which time view that is being rendered. Furthermore, each view will have the axis visually categorizing the data into interlaced color pattern, that will be introduced more thoroughly in the next section. In addition, a date label that summarizes the visualized time frame is present underneath the axis and is updated automatically upon view rendering. The date label is formatted in the day number, active month name and numerical year and is updated with different function calls depending on if the view is rendering an interval within a day or a longer time interval, to present a correct summary to the viewer.

(40)

30 5-weeks view

This view was originally designed to present a monthly view. However, due to inconsistently in the length of a month, a choice of introducing a set 5-weeks view was decided. Mainly, the purpose is to make it easier to implement the view by following a set number of weeks, rather than finding ways to render partial weeks without issues etc. Consequently, the horizontal axis visually categorizes data into week numbers of the year which is a more consequent format when regarding the other views of the interface. Figure 11 shows the axis and summarized date label for the 5-weeks view.

Figure 11 Showing the horizontal axis and corresponding date label in the 5-weeks view.

Week view

In this view, the horizontal axis is visually categorizing data into weekdays. The day of the last measurement will be shown as the last day in the axis with the calculated previous days in order to create a week view. Figure 12 showcases the axis and summarized date label of the week view.

Figure 12 Showing the horizontal axis and corresponding date label in the week view.

Day View

The day view has the horizontal axis visually categorizing data after grouped time intervals of 4 hours separately, see Figure 13.

(41)

31 5.2.2 Scatter plot

With the decided model set as a scatter plot, the data points are visualized as dots, seen in Figure 9. Each dot corresponds to the two labels in the JSON-file. In our case, the first label which is the value of the measurement is alleged to a y-coordinate of the dot and the timestamp alleged as the x-coordinate. The dots are then created as children to the parent SVG and appended reliant on the data flow bound to the visualization. The dots can be appended generically, by using the same naming convention as the labels of the data file. If the scatter plot has a lot of data values, especially close to each other, the user may have to deal with an overpopulated and distorted graph. In a situation where a graph is overpopulated and thus difficult to draw any conclusions from, an opacity filter can be introduced in order to easily deviate scatter points that are close, from each other. The scale of used for the opacity filter, ranges between 1 to 0, where 1 represents fully transparent and 0 fully opaque. The value of 0.8 was chosen, which is close to fully transparent in order to not lack color intensity so the user would not have difficulties viewing the data point, but with the advantage of being able to differentiate the scatter dots in the case of an overpopulated graph. The opacity filer is set directly in the script, rather by CSS, since it allows the dots to show shadows, identifying each dot from each other in the case of close positioning.

5.2.3 Static or dynamic toggling

The D3.js library allows animated transitions with timed durations for a dynamic toggling between the different time views. This was initially planned since it is a likable feature and was well perceived at the constituent. However, the feature became problematic with inconsistency with other design aspects and produced bugs that were hard to solve.

Instead, further focus was dedicated to static toggling, with a new view loading directly upon clicking. Focus was also set on one of the design aspects that were conflicted upon with the transitioning feature. This feature was the interlaced coloring of grids that can be seen in Figure 9. The grids create a pattern of straight lines that can cross over each other, forming rectangles. Their primary purpose is to highlight sections of interest, by variating the color shade, offering easier orientation. By coloring the space between every second vertical grid line originating from the x-axis, an attempt in assisting the viewer to focus on a more specified information window has been made with the interlaced pattern.

(42)

32 5.2.4 Snap grid hovering and tooltips

A graph can be easily distracting, if a lot of information needs to be featured at once in the view. Hence, the functionality of hovering on a set data point can provide a more elegant approach on expanding the information scope for the viewer. There were several ways of tackling the design, such as enlarging the scatter points upon hovering or introducing the concept of a snap grid.

A snap grid is a vertical line, intersecting the graph vertically through the scatter plot that is calculated as closest in distance related to the position of the current mouse pointer. The snap grid is present only upon hovering in the SVG of the graph area. In addition, a tooltip box shows up with more detailed information regarding the exact value and time stamp of the registered measurement, see Figure 14. The tooltip box is slightly transparent to allow covered scatters to still be viable.

To achieve the expected behavior of having a snap grid that intersects at the right place, mathematical calculations were required. The D3.js library has a

function call d3.bisector [55]. The concept of bisecting allows the

highlight of the closest data point when moving the mouse pointer within the

graph show in Figure 14. The d3.bisector is based on the searching

algorithm Binary Search and is applied to recover the closest x-coordinate in the dataset with the help of the mouse position. Since Binary Search only works on sorted lists or arrays, the data flow needs to be sorted. If data is fetched from a data base, it can be sorted upon fetch but with randomly generated mock data that is not sorted, problems are expected, but quickly solved by adding a functionality that sorts the timestamps chronologically at the start of the data binding.

(43)

33 5.2.5 Responsiveness

In the first prototypes, the height, width and margins of the SVG element were set to hardcoded values to match the device that the interface was implemented in. There are no apparent advantages in depending on fixed values alone and responsiveness is a feature that is appreciated in most contexts. A responsive property is also required for the integration of the interface to avoid unexpected behavior or if the interface is accessed on different devices with different screen sizes or web browsers.

In D3.js, there are several approaches in order to create a chart or graph that is responsive. The chosen approach in this interface was to create a function described on the API docs that collects the SVG aspect ratio and preserves it. Upon initial page load and with the SVG-container changing, the SVG will preserve the calculated ratio and resize the graph after need. Figure 15 shows the preserved ratio of the graph when viewed on a phone screen. The interface is not fully adjusted to be viewed on such small devices, but the buttons adapt to a more vertical design to give an easier access for toggling. The graph will however keep the responsive property, preserving the ratio to avoid faulty visualizations.

(44)

34

5.3 Integration to Medication and Care Support System

By integrating the interface into a more substantial context, such as the constituent’s system MCSS, compatibility with both environment and real data from databases is tested. Missing functionality or bugs may appear when integrating, allowing further customization of the interface. This is a step that can be considered as the finishing of the design process of the interface within the scope of the thesis, while attempting to create a minimum viable product, MVP.

5.3.1 Minimum Viable Product

A minimum viable product is a version of a product with just enough features to make the product deployable. Since it is less expensive to develop a product with less features, the costs and risks are reduced if the product would potentially be a failure. [56]

5.3.2 Release

When features or components are introduced in MCSS, they become a part of a release. A release can be defined by different stages, dependent on how ready the feature is for production. The advantage of releasing a feature in an early stage, only available in the developing mode of the system, is that additional developers and testers have access and can evaluate it. The last required steps for the design to become an MVP can then be identified and further worked on.

5.3.3 Integration troubleshooting

Some issues with the original code were identified when integrating the data visualization interface into the system:

• Data binding issues • Snap grid hover bug • CSS classes inconsistency Data binding issues

The majority of issues faced when integrating, were related to the data binding and solved by introducing a more generic approach towards customization the features related to the data. For example, instead of previously hardcoded minimum and maximum values of the axes, the values can be set based on the data flow that the visualizations are bound to.

(45)

35 Another example of a more generic approach taken after realizing that the data binding is partial, was that the unit symbol of the y-axis can be fetched from the data flow. The unit symbol is dependent on the measurement and a generic approach of fetching the symbol allows any measurement to be presented with the correct unit, in both the y-axis label and in the hover toolbox. See Figure 16 for an example of a generic unit symbol of the y-axis label.

Figure 16 The unit symbol of percentage due to the currently visualized measurement being oxygen saturation.

The data issues helped identifying missing functionalities, allowing a more generic interface. Other identified issues were scatter points present outside the graph area, due to rendering issues with the scatter points. This was however fixed by either limit the data flow to match the time range of the graphs, or by adding the scatter points to the render functionality with enter and exit stages. In addition, a filter of the scatter points was introduced, in order to only show values that are within the range of the timestamps of the x-axis.

The later mentioned fixes allowed more liberty related to the data gathering, since the data flow does not need to be further restricted. However, it is good praxis to not produce huge JSON files with time that will require big JSON requests when the interface binds data.

Snap grid hover bug

(46)

36 CSS-class inconsistencies

The CSS-classes used for the graph conflicted with already existing classes used by the system and needed to be renamed in order to:

1. Follow the naming convention of MCSS created by front-end developers.

2. Avoid duplicate names for different CSS classes.

(47)

37

6 Result Summary

The purpose of the degree project was to gain a better understanding of web-based data visualizations and how an interface of these visualizations can be designed, developed and evaluated. This chapter summarizes the results presented in the chapters 4 and 5 and then attempts to answer the research questions of the thesis.

6.1 Data Visualization Interface: Prerequisites

Following prerequisites were contemplated in a DDDM-approach before designing the web interface, presented in chapter 4:

• Data gathering with the majority of the data flow being mocked with a realistically set coefficient of variance, CV. The format of the mocked data was initially generated to a CSV format but then changed to JSON, to follow the constituent’s data convention.

• A primary choice of model, the scatter plot, since continuous data cannot be ensured, and individual data points are intended for the purpose of showing relationships between two variables, in this case between a time stamp and a numerical value.

• The choice of web library for visualizations. D3.js as the choice, since it is established within the data visualization field with a generous documentation platform. Also, it is a JavaScript library, which makes it appropriate within the front-end development field.

• A risk analysis dedicated to the medical context of the visualization interface relevant to the Covid-19 parameters and other MCSS measurements.

6.2 Data Visualization Interface: Design

In the final prototype of the data visualization interface, following features were designed and implemented, presented in chapter 5:

• Data values represented as data dots in the graph canvas creating a scatter plot, fulfilling the prerequisite of the chosen model.

• Buttons for toggling between rendered views, which allows user interaction.

• Generic axes. A dynamic x-axis formatted in a time scale, with a static y-axis customizable to change in data.

• Grid lines creating an interlaced pattern of rectangles, for easier tracing of data dots.

• Snap grid hovering utilizing a bisector-function with a tooltip providing additional information, without giving the graph canvas a cluttered look.

(48)

38

6.3 Answering the research questions

The research question stated in the thesis was “How can a web-based interface for data visualization be created?”, which was then further divided into sub-questions. This section attempts to answer the sub-questions with the results from the study.

The presented results have been based on a degree project focused on creating a data visualization interface amended for the constituent’s Covid-19 measurements with scaling possibilities to allow visualizations of added measurements. However, in order to answer the general research questions of the thesis, identified patterns or learnings from study can be applied in more general circumstances, instead of the particular instance of the thesis’ literature and case study.

Based on the results from the study, following identified objectives have been deducted to answer the stated questions.

SRQ1. What are the prerequisites needed for visualizing data on the web? Based on the literature study, several prerequisites can be identified before visualizing data on the web. The choice of web library should be compatible with desired programming language, while providing enough documentation online in order to outsource potential gaps in experience when implementing. This proved to be beneficial later on in the case study. The choice of model should also be carefully considered in regard to the data gathering, which should be settled to avoid disagreements later on during the design and implementation of the interface. Furthermore, if the source of data is incomplete, the data gathering can be compromised with a made-up flow, also called mocked data, in order to allow a basis for the visualizations.

Finally, a potential risk analysis proved to be valuable, since it helps eradicate unnecessary efforts and avoiding spending resources on features that has been identified as redundant.

SRQ2. How can data visualizations be designed and implemented into an interface through a web library?

The case study showed that data visualizations based on the chosen web library could be designed and implemented to help the constituent with an interface for Covid-19 measurements. However, several features can be implemented in assisting the user of any data visualization interface.