Implementation of visualizations using a server-client architecture : Effects on performance measurements

(1)

Spring term 2020 | LIU-IDA/LITH-EX-G--20/052--SE

Implementation of visualizations

using a server-client architecture

Effects on performance measurements

Pia Løtvedt

Tutor, Jody Foo

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från

publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för

enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring

av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och

tillgängligheten finns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god

sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras

eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida

http://www.ep.liu.se/

.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of

25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to download,

or to print out single copies for his/hers own use and to use it unchanged for no n-commercial research and

educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the

document are conditional upon the consent of the copyright owner. The publisher has taken technical and

administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is

accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for

publication and for assurance of document integrity, please refer to its www home page:

http://www.ep.liu.se/.

(3)

- 1 -

Implementation of visualizations using a server-client architecture

– effects on performance measurements

Pia Løtvedt

Linköping University Linköping, Sweden

pialo059@student.liu.se

ABSTRACT

Visualizing large datasets poses challenges in terms of how to create visualization applications with good performance. Due to the amount of data, transfer speed and processing speed may lead to waiting times that cause users to abandon the application. It is therefore important to select methods and techniques that can handle the data in as efficient a way as possible. The aim of this study was to investigate if a server-client architecture had better performance in a visualization web application than a purely client-side architecture in terms of selected performance metrics and network load, and whether the selection of implementation language and tools affected the performance of the server-client architecture implementation. To answer these questions, a visualization application was implemented in three different ways: a purely client-side implementation, a server-client implementation using Node.js for the server, and a server-client implementation using Flask for the server. The results showed that the purely client-side architecture suffered from a very long page loading time and high network load but was able to process data quickly in response to user actions in the application. The server-client architecture implementations could load the page faster, but responding to requests took longer, whereas the amount of data transferred was much lower. Furthermore, the server-client architecture implemented with a Node.js server performed better on all metrics than the application implemented with a Flask server. Overall, when taking all measurements into consideration, the Node.js server architecture may be the best choice among the three when working with a large dataset, although the longer response time compared to the purely client-side architecture may cause the application to seem less responsive.

KEYWORDS

Client-server, system architecture, visualization, web performance

1 Introduction

Throughout the last few decades, the ability to capture and store data has improved drastically, which allows for the creation of massive datasets. Dobre and Xhafa [1] estimated in 2014 that 2.5 quintillion bytes of data were generated every day, and that 90 % of the data existing at that time had been generated during the last

two years. This number is likely to only continue increasing. The huge availability of data leads to novel challenges in regard to processing, analysing and visualizing it in meaningful ways [2]. For data visualization using dynamic visual displays, web applications are a common choice of platform. The available tools for tailoring presentation of data in browsers are abundant, both through the use of CSS for styling and through JavaScript libraries for more powerful data processing and display. In particular, there are numerous JavaScript libraries available for implementation of dynamic visualizations.

However, when working with visualizations of big data, the nature of the data and the amount of it may lead to efficiency problems. For example, downloading a large dataset into a browser and then processing it before using it in a visualization may lead to significant time delays before the visualization may be available. This in turn may lead to the observer losing interest while waiting, and therefore choosing to leave the web site. It is therefore important to implement visualizations in such a way that they can be displayed within a reasonable amount of time [3, 4, 5].

An alternative to handling all data on the client side could be to implement a server-client architecture, where data processing is handled on the server side and only relevant data is transmitted to the client. This would decrease the time needed to download the data to the client, and as data processing may be faster on the server side, the overall time needed before the visualization can be displayed is likely to be decreased.

The selection of programming languages and tools used for the implementation may also affect the performance of the resulting application, since the efficiency of different languages may vary [6, 7].

2 Aim

The aim of this study was to investigate performance of a visualization web application when implemented in three different ways, using two different types of architectures and with different implementation languages for a part of the architecture.

(4)

- 2 - The following research questions were formulated:

1 Does a server-client architecture for a visualization web application perform better than a client-side architecture in terms of selected performance measures and network load?

2 Are there differences in the performance of server-client architectures implemented using different web technologies on the server side?

3 Theory

Big data and its challenges

The term big data is usually credited to John Mashey [8]. This type of data is often characterized by the five Vs [9]:

- Variety – data exists in many different formats, such as text, images, audio, etc

- Velocity – data is generated at high speeds - Volume – datasets are very large

- Value – the data can be used for purposes such as achieving specific goals

- Veracity – data may be of varying quality

These characteristics lead to challenges in terms of handling the data using classical computing paradigms. For example, regular SQL database systems are not capable of dealing with data amounts in the order of terabytes and petabytes. Transferring large amounts of data can also be challenging and time consuming. Downloading 1 terabyte with a download speed of 100 MBit/s takes approximately 24 hours. However, when handled and analyzed properly, big data may be very informative and profitable [10], and so there is great potential in finding ways to deal with large amounts of data.

Data visualization

The field of data visualization centres on presenting information in a graphical form. Displaying data in charts or graphs, allows the observer to investigate, explore, interact with and answer questions with data that would otherwise be difficult to grasp. For this reason, creating good visualizations is often an important goal when needing to present data.

There are many challenges related to creating good visualizations. Choice of visualization type and design, choice of platform for display and choice of interaction techniques are some of the decisions that should rely on informed knowledge about user psychology and behaviour. If the visualization is to be displayed live and with interaction opportunities, another challenge is how to efficiently handle data processing and response to user actions. When working with large datasets, implementation details become even more important in creating an efficient application. Since there is a larger amount of data to process, this may become very time consuming. Using a good system design and appropriate algorithms and tools can make the difference between a well-working application and one that users will not want to use.

Performance measurements

With the increase of web technologies and internet usage, web performance has turned out to be one of the most important

factors determining the success of a web page. Research from 2004 showed that users are willing to wait 2 seconds for a web page to respond, at least for simple information retrieval tasks [11]. Within e-commerce, it has repeatedly been shown that the performance of web pages plays a crucial role in profitability [12], and results from Think with Google showed large increases in user bounce probability as page load time increased from 1 s up to 10 s [13]. It has also been revealed that page speed is used in search engine rankings [14].

One of the most common performance metrics for web performance is the page load time. The page load time can be defined as the time between when the user requests a page and that page is fully rendered in the browser [15]. Other web performance metrics that may affect the user experience is the time it takes the page to respond to user actions, and the network load incurred by the web page.

Long page load times have been shown to lead to users abandoning the webpage before it has finished loading [13], and it is generally recommended to avoid long loading times. However, as web pages become increasingly complex, it may be difficult to keep the loading time from becoming longer.

There are many factors affecting the performance of a web page. Among these are the size of the resources that must be downloaded to the page, the network conditions, the number of requests that must be performed to retrieve resources and the scheduling of different stages of the page load process [16]. Another important factor for web performance is the platform on which the web page will be viewed, as it has been shown that there is a large difference between the performance in mobile browsers and desktop browsers, particularly as the web page increases in size [17].

A large number of tricks to optimize the loading time exist, such as reducing file sizes through compression or simply reducing character count, combining several files into one and loading CSS files before JavaScript files [15]. Another possibility is to load the web page progressively, for example by only loading resources at the moment they are needed. This could mean only loading images as they are to be shown on the screen. Such progressive, or “lazy”, loading, is recommended as best practice by Google to improve web page performance [18].

Other tricks take advantage of user behaviour and psychology to reduce the perceived loading time. This could for example be to include a feedback bar to show loading progress, which has been shown to prolong the user’s patience when waiting for the page to load [7]. Research also shows that the perceived loading time can be reduced by predicting the user’s eye gaze pattern when entering the web page and prioritizing loading of the resources the user will see first [18].

Related work

In an early paper on web visualization applications, Jern [19] discussed advantages and disadvantages of using a so-called “thin” client that only downloads web plugins or components to the browser and is not capable of own program execution. In this

(5)

- 3 - model, all data manipulation would be controlled by a backend server. He claimed that this would lead to reductions in costs of software and maintenance as developers would no longer need to create software that could run locally on the clients and then having to support multiple operating systems. However, when his paper was published in 1998, he suggested that real time interaction with the data was not possible with this model, since there were issues with transferring processed information into HTML. In this case, he suggested transferring a reduced amount of raw data to the client and having the client process the data for the visualization.

Janicki and colleagues [20] created a client-server application to allow users to view and interact with visualizations of ant species distribution. They did not compare their design to that of a pure client architecture, nor did they conduct any explicit performance measurement, but they note that performance was improved through use of good relational database structure, and their application was deployed successfully.

Wu et al [21] noted that even with a traditional client-server architecture, the common visualization tools and libraries, such as D3.js, still require some amount of data processing in the client, which leads to poorer performance when working with large datasets. They therefore investigated a new workflow intended to move as much work as possible from the client to the server to improve performance. The goal of this workflow was to perform all calculations on the server and then transfer only the resulting images to the client. They found that while this workflow might perform worse than the traditional server-client workflow when dealing with small datasets, the traditional workflow suffered large drops in performance when the amount of data increased, whereas their new workflow could still manage to display visualizations with decent performance.

Wessels et al [22] worked with a similar model of generating visualizations on the server side and then sending images to the clients. In their study, they used web sockets to maintain open connections to the clients and send the images, and they suggest this is an ideal way of minimizing network overhead as seen with a regular server. This might be particularly useful when new data arrives to the server frequently, as it avoids the client having to open new connections and polling the server. However, they only outlined how the system would be implemented without actually creating such a system, and so no conclusions can be made about the performance of this architecture.

4 Method

Approach

To answer the research questions, a visualization application was implemented in three different ways, using two different types of architectures and different programming languages and tools. One implementation was a purely client-side architecture where only the dataset was retrieved from the server and all subsequent processing took place in the client’s web browser. The other two implementations used a server-client architecture where data

processing requests were sent from the client to the server. The server handled all data processing and responded to the client with the finished data to be displayed in visualizations. This second type of architecture was implemented in two ways, one with a backend server written in Node.js while the other had a backend server written in Python3 with Flask.

The performance of these three implementations was then compared through measurements of page loading time, user action response time and network load.

Delimitations

Although this study worked with a static dataset, the implementations were done as if new data could arrive during and between sessions. This meant that there was no caching of data between client sessions, and that pre-processing of data was only allowed to such an extent that pre-processed data would not have to be recalculated in the event that new data should arrive.

Dataset

To allow for implementation of visualizations requiring processing of data, a sufficiently large and detailed dataset had to be used. Spatiotemporal data contains information about events occurring at specific locations and at specific points in time. It is among the most common types of data, and spatiotemporal datasets are found in such varied fields as climate science, neuroscience, agriculture, epidemiology, social media, traffic dynamics and crime data, among others [23]. If analyzed and utilized correctly, spatiotemporal data may be tremendously informative, and it can be a great asset for example in exploratory analyses and decision-making. Due to the multidimensional nature of this type of data, it is beneficial to allow for dynamic manipulation of the visualization by the observer [24]. A large spatiotemporal dataset would allow for meaningful and demanding data processing such as averaging values over regions or extracting datapoints from specific time periods.

The dataset used in this study was a collection of historical temperature measurements from 3448 unique cities in 159 countries. Earth surface temperature values and temperature uncertainty in degrees Celsius were provided for each city monthly. Information about city location and country were also included on each row. The earliest measurement was from November 1743 and the last was from September 2013, however not all cities had monthly measurements from the entire time interval. The dataset also contained rows where temperature data was missing. The full dataset contained 8235082 rows. It was formatted as a CSV, and the total size was 520 MB. The dataset was part of a collection of several different datasets containing raw data from the Berkely Earth data page, and it was acquired from Kaggle [25].

For ease of development of the applications and subsequently for measuring page loading time when using a smaller sized dataset, the large dataset was truncated at 10% of its original size, resulting in a dataset of 821676 rows and a size of 55 MB.

(6)

- 4 -

Implementation

System specifications

The implementations were developed and run on an HP Pavilion x360 Convertible 14-ba1xx with a 64-bit Windows 10 operating system, an Intel Core i7-8550U processor and 12 GB RAM. Testing of the application implementations was performed on the same system using Google Chrome v 81.0.4044.138.

Layout and function of visualization application

The design of the components of the visualization application can be seen in figure 1 and figure 2. The application consisted of a chloropleth map showing the entire world with country borders (Figure 1). Each country was filled with a colour based on the average temperature of that country on a specific timepoint, calculated as the average of the temperatures of the cities in the country at that timepoint. A slider below the map could be used to set the timepoint for the map. When zooming in on the map, markers showing location of cities were shown. These markers were filled with a color denoting the temperature of each city on the timepoint selected in the slider.

When clicking on a country, the map view would zoom in on that country, and simultaneously, the linechart component would update to show data for that country (Figure 2). The linechart component was placed below the map, and had a selection of different types of linecharts to show (average yearly temperature for country, average yearly temperature for the cities of that country, the yearly deviation of each city’s temperature from the country average, and average temperatures in each month of the year). The different types of linechart could be selected by using radioboxes placed above the linechart. When clicking on a city marker in the chloropleth map, the linechart component would update to show data for that particular city, with a selection of two different types of linechart (average yearly temperature for the city and average temperatures for each month).

Figure 1: Chloropleth map of the world with country colors based on country temperature at specific timepoints. Below the map is a slider used to set the timepoint to be shown on the map.

Figure 2: Linechart component of the visualization application, here showing average yearly temperature for the United Kingdom.

Overall design of implementations

The overall architectures of the different types of implementation are shown in Figure 3. On the client side, all three implementations had a visualization layer and a data layer. The visualization layer of the three implementations was the same, whereas the data layer was the same in the two server-client architectures, but different in the pure client architecture. The implementations also had a backend server, but this was implemented differently in each system. The specifics of the implementations will be described in more detail in the following sections.

Figure 3: General layout of the two different types of architecture used in the study; the purely client-side architecture and the server-client architecture. There were two different implementations of the server-client architecture, and the visualization layer was the same in all three implementations.

Visualization layer

The visualization layer, which was the same in all three implementations, consisted of HTML, CSS and JavaScript files and was responsible for displaying the visualizations and handling

(7)

- 5 - user interactions. It communicated with the underlying data layer. To display visualizations, the D3.js library1_{was used.}

Pure client architecture: data layer and backend server

The data layer of the pure client architecture was implemented in JavaScript and consisted of a class with methods to handle various data processing requests from the visualization layer. Upon initialization, the dataset was loaded from the server and pre-processed to create two separate hashes. One contained all countries as keys and lists of cities and their locations in each country as values. The other hash had all cities as keys and lists of measurements as values. These hashes would not require any new calculations should new measurements arrive for any city, and so would still be useful if the data was updated.

The data layer also contained methods used for e.g. calculating average yearly temperature for a country, calculating yearly temperature curve for cities or countries, or retrieving temperatures for all cities on a specific timepoint.

It was necessary to use a server in the pure client architecture to enable serving of the dataset file. For this reason, a local development server was set up through the Live Server2_extension

for Visual Studio Code3_{. Its only purpose was serving files,}

whereas all data processing occurred in the client. Server-client architecture: data layer

Both server-client architecture implementations used the same data layer, which was implemented in JavaScript. Its purpose was to serve as an intermediary between requests from the visualization layer and the backend server. It consisted of a class with a number of methods which performed Ajax calls to server endpoints, received the response and returned the data to the visualization layer.

Server-client architecture: Node.js server

One of the server-client architecture implementations used a server written in Node.js4_{, with Express}5_{as its web application}

framework. Upon starting the server, the dataset was read into memory and brief pre-processing similar to that of the pure client data layer occurred. The server defined several API endpoints for the data requests that would arrive from the client’s data layer. The data processing for these endpoints used code similar to that of the pure client architecture’s data layer.

Server-client architecture: Flask server

The other implementation of the server-client architecture used a backend server written in Python 3.7.06_{. Flask}7_{was used as the}

web application framework. Data processing and manipulation was performed using pandas8_{. When the server was started, the}

dataset was read into memory and pre-processed to a small degree

1_{https://d3js.org/} 2_{https://marketplace.visualstudio.com/items?itemName=ritwickdey.LiveServer.} 3_{https://code.visualstudio.com/} 4_{https://nodejs.org/en/} 5 https://expressjs.com/ 6_{https://www.python.org/} 7_{https://flask.palletsprojects.com/en/1.1.x/} 8 https://pandas.pydata.org/

to allow for easier extraction of dates. A number of API endpoints were defined, corresponding to the requests that would arrive from the client’s data layer.

Code

The code of all implementations is available on GitHub9_.

Measurements

Three different metrics were used to compare the performance of the different implementations: Page loading time, user action response time and network load. In addition, a test suite was defined to simulate a normal user session, and the cumulative response time and network load for the suite were measured. For the page loading time and user action response time, as well as for the cumulative response time in the test suite, each measurement was recorded five times and the average time was calculated along with the standard deviation.

Page loading time

To measure the time needed to load page, start and end time was recorded in the function that initialized the entire application. This included loading of initial data to be displayed, or in the case of the pure client architecture, the entire dataset. It also included loading and drawing of the chloropleth map with initial data. It did not include the loading of the initial HTML, CSS and JavaScript files, but this time is likely to be the same across all three implementations and would only account for a very small amount of time due to small file sizes. Page loading time was recorded both with the full dataset and with the dataset reduced to 10% of the original size.

User action response time

As part of the visualization layer, a benchmarking tool was implemented. This consisted of a function recording a starting time, then executing a callback function and waiting for it to finish, then recording the time after the function finished and reporting the time difference in the browser console. This tool was used to measure the response time for different user actions by supplying the corresponding visualization layer methods as callback functions. Since these methods handled communication with the data layer, waited for results and displayed the resulting changes in the application, the reported time corresponded to the response time for each action. This measurement did not include the initial time needed to fire event handlers, but this time would be the same in all three implementations and additionally would be very short. In addition to the response time for each single action, a cumulative time for all actions performed since loading the page was also reported after each new action had been performed.

Network load

To measure network load, that is, the amount of resources transferred over the network, Google Chrome’s DevTools10_were

used. To record the amount of resources used by the page, the Network tab in the DevTools was inspected, and the “resources” metric was noted.

9_{https://github.com/MindRoadAB/world-temperatures-visualization} 10

(8)

- 6 - Test suite

To gain an understanding of how the visualization application’s performance varied during a user session with the three different architectures, a test suite was designed. This suite consisted of loading the page and performing a set of actions (see Table 1). At the end of the test suite, the page loading time, the cumulative response time and the network load were recorded.

Table 1: Actions to be performed during the test suite.

Action

Load page

On map, click Norway

In linechart, click “Show line for country”

In linechart, click “Show city deviation from country average” In linechart, click “Show yearly curve”

On map, click India

In linechart, click “Show line for country” In linechart, click “Show line for cities” On map, click France

In linechart, click “Show city deviation from country average” On map, zoom in on USA and click Atlanta

In linechart, click “Show yearly curve” On map, zoom in on France and click Paris In linechart click “Show line for city” Record time and network load

Figure 4: Time to load the page with the first view of the visualization application as measured with the in-code benchmarking. Results are measured in seconds and are shown for the different implementations and for two sizes of dataset. Error bars indicate standard deviation.

5 Results

Page loading time

Figure 4 shows the time needed to load the page to show the first view of the visualization application. The results showed that the pure client architecture took longer to load the page, both with the full dataset and with the smaller dataset, with a loading time of

Figure 5: Response time (time from user action to resulting change in visualization) measured in milliseconds (ms) for various user actions. Results are shown for the three different implements. Error bars indicate standard deviation. The choice of Norway and India for measuring response times show the difference between data processing when the country in question had few or many cities, respectively. 0 500 1000 1500 2000 2500 City lines Norway City lines India Average line Norway Average line India City deviations Norway Yearly curve Norway Yearly curve India City line Atlanta Yearly curve Atlanta Change timeline date Ti me (ms )

Response time from user action to application change

Server-client (Flask) Server-client (Node) Pure client

(9)

- 7 - 29.5 ± 2.8 s when using the full 520 MB dataset.

The Node.js server architecture had the shortest loading time, with a loading process lasting 0.7 ± 0.05 s using the full dataset. The Flask server architecture needed 4.8 ± 0.2 s to load the page when using the full dataset.

Response time for user actions

Figure 5 shows the response times for the three implementations for a selection of user actions, measured as the time it takes to perform the steps needed to display the resulting changes to the visualization application.

Test suite: Total response time

Figure 6 shows the cumulative response time including page loading time when running through the test suite, as calculated using the implemented benchmarking tool. The Flask server architecture had the longest total response time, with 4.9 ± 0.3 s needed to load the page and 25.0 ± 0.8 s needed for responses to user actions. The pure client architecture had the second longest total response time, needing 28.1 ± 0.8 s to load the page and a total of 1.4 ± 0.3 s needed for responses to user actions. The Node.js server architecture needed 1.5 ± 0.6 s to load the page and 2.4 ± 0.2 s to respond to all user actions.

Test suite: Network load

Figure 7 shows the cumulative amount of resources transferred over the network when performing the test suite. The amount of resources transferred was largest for the pure client architecture, with 534 MB. The Node.js server architecture transferred the least amount of resources, 8.9 MB, while the Flask server transferred 12.1 MB.

Figure 6: Cumulative response time in seconds for actions perform during the test suite. Results are shown for the three different implementations. Error bars indicate standard deviation

Figure 7: Total amount of resources loaded over network during the test suite. Results are shown as amount in MB. Numbers above each bar indicates the exact amount.

6 Discussion

In this study, it has been shown that performance measures of a visualization application may vary significantly based on the underlying architecture and the programming languages and tools used for the implementation. An architecture in which all data is transferred to the client for processing and display can respond rapidly to most user actions, but suffers from a very long loading time due to the need to transfer a large amount of data in the beginning. On the other hand, an architecture where data processing is located on the server side and requests from the client are responded to with only the necessary data have a much reduced loading time, but will lead to longer response times due to request handling and data transfer. However, the details of the implementation of the server makes a large difference in the performance of the application, with a Node.js server written in JavaScript performing far better than a Flask server implemented in Python 3.

Client-side versus server-side data processing

Research question 1 asked whether a server-client architecture would perform better than a pure client architecture in terms of selected performance measures and network load. The results showed marked differences in the performance measures between the pure client architecture and the server-client architectures. Notably, the pure client architecture had a very long page loading time, needing on average almost 30 seconds to display the application when loading the page, whereas the server-client architectures both needed less than 5 seconds to load the page. This difference is due to the amount of resources that must be transferred over the network as the page is loading, as well as the data processing that occurs before the visualization application can be displayed. In the pure client architecture, the entire dataset had to be transferred and then processed before any data could be shown. In the server-client architectures, the dataset was loaded into memory when the server was started, and so there was no delay in the page loading time associated with that. Although some data processing needed to be done to serve the initial requests, the amount of data that needed to be transferred over the network as the page loaded was much smaller, only the information needed to display the initial view of the application,

(10)

- 8 - and this therefore took a much shorter time than for the pure client architecture.

Since long page loading times have been shown to increase likelihood of users abandoning the page [11, 13], the long loading time needed for the pure client architecture can be considered a large disadvantage. As mentioned previously, there are ways to reduce actual and perceived loading time, for example by compressing data or adding a progress bar, but due to the large difference observed between the implementations, this is unlikely to even out the loading times to a sufficient degree. The results also showed that the difference in page loading time became much more severe as the size of the dataset increased. A further complication of loading large datasets into the browser is that some modern browsers have a maximum amount of memory allocated to resources. It therefore seems clear that a server-client architecture is a more natural choice when dealing with larger data.

In terms of response time to user actions, the pure client architecture outperformed the two server-client architectures. This is likely due to a combination of the data already existing in the browser, and the speed at which JavaScript can perform calculations, which gave this architecture an advantage over the two server-client architectures. As a result, the application based on the pure client architecture is more responsive to user input once the page has finished loading.

Choice of language for server implementation

Research question 2 asked whether there would be differences in performance between server-client architectures implemented using different web technologies on the server side. The results showed that the Node.js server performed better than the Flask server in terms of both page loading time and time to respond to user actions. Comparisons of different programming languages and web technologies have shown that JavaScript performs faster than Python on the quicksort algorithm [7], that Python generally performs worse than other languages when used in web applications and that Node.js performs quite well [6, 29]. A reason for selecting Python3 as the language for one of the server implementations was to enable the use of pandas data processing. pandas claims to be “fast, powerful, flexible and easy to use”, and on the pandas website it is noted that it is highly optimized for performance due to parts of the code being implemented in Cython or C [30]. Despite this, the present study found that use of pandas did not lead to a better performance than straight-forward implementation of data processing in JavaScript. It should be noted, however, that pandas was used as part of a larger system and not tested in isolation against other data processing implementations. Nonetheless, it seems clear that when high performance is necessary, a server implemented in Node.js is a better choice than a Python server using pandas.

Implications of methodological choices

Selection of algorithms in the different implementations

In this study, three different implementations of the same application have been compared. The results give a clear indication of which implementation shows the best overall

performance. However, it is possible that implementation details may have affected results in unknown ways, particularly when comparing implementations using different programming languages. In the comparison between the pure client architecture and the Node.js server-client architecture, the code for processing data could be reused to a great extent, with only small adaptations to send data as server responses and similar. However, for the Flask server-client architecture, the implementation used pandas to process data, which leads to different algorithms being used to retrieve data and perform calculations. There is therefore a possibility that the data processing algorithms of one of these implementations is less efficient than the other, and that a better selection of algorithms could alter the results in one way or another.

Selection of programming languages and tools

The selection of programming languages and tools for implementation as well as details of the architecture types may affect the results to a large extent, espec ially the comparison between the two server-client architectures. In the present study, Python3 was selected for one of the backend servers to allow the use of pandas as data processing library, whereas Node.js was selected for the other server implementation since it could reuse code from the pure client data layer. However, there are many other possible web technologies that would have been interesting to investigate, and it is possible that some of them may perform better than the ones selected here. As an example, none of the server-client implementations used a database to store the data, and it is possible that use of a database would have enabled more efficient data retrieval.

System architecture design

As noted in the section about related research, attempts have been made to design different types of workflows for visualization applications, for example by generating visualization images on the server and transferring only those images to the client [21, 22]. It is possible that system designs such as these would lead to very interesting results and that performance could be better than observed in this study.

Optimization of performance

As mentioned in previous sections, there are tricks that could be employed to lower the page loading time and response times, and in this study, there have been no attempts to make use of these. It is therefore likely that the performance of all three implementations may be optimized to some degree.

Furthermore, performance could also be improved by taking advantage of caching of data and increased pre-processing. Caching of data in the server could lead to reduced page loading times, although this would be more relevant for the server-client implementations, where the amount of data that must be cached is reasonable. In the server-client implementations, it could also be possible to perform more pre-processing of data as the server was started, which could lead to less processing being performed when the client requested specific calculations. However, this would not be in line with the delimitations set for this study as it might mean that the implementations could not be used with live data where new data points could arrive during a session.

(11)

- 9 - Hardware specifications and choice of platforms

The study does not take into account effects of using different platforms or browsers. The laptop used when measuring performance had a rather modern Intel i7 processor and 12 GB RAM, and it is likely that data processing could be executed more efficiently on other platforms or by taking advantage of the multicore processor.

Local vs remote server

All three implementations were tested using a local server, meaning that files did not have be transferred from a remote server as is the case on a deployed web page. Due to this, the study does not take bandwidth limitations and effects of network conditions into account, and as a result, the page loading times and response times observed are shorter than they would be if data was retrieved from a remote server.

Dataset size

While the dataset used in this study put some strain on both the pure client implementation and the server-client implementations in terms of loading it into memory and processing data, it would not be classed as “big data” according to the general interpretation of this term [31]. Had the dataset truly been big data, it would not have been possible to use the system architectures that were employed in this study, due to the amount of data that must be stored and processed. Instead, system models more appropriate for big data would be required, such as for example Apache Hadoop11. Nonetheless, this study does give valuable information about performance in applications using a moderately sized dataset.

Future work

This study paves the way for many interesting new projects. As mentioned in the previous sections, the selection of implementation languages, tools and algorithms may lead to different results than those observed here, and it would be beneficial to look more closely into which choices would lead to the best performance for this type of application. For example, the use of databases suited to fast data retrieval could mean faster data processing, leading to increased performance. A more elaborate architecture, for instance through efficient use of a multicore server for data processing could also be a promising way to improve performance.

Another interesting avenue for new studies would be to look at effects of different platforms, browsers and network conditions on the performance of the application. As this study only tested performance on a single laptop computer and using a single browser, the results may look slightly different on other platforms and with other software. In particular, investigations into performance on mobile platforms would be highly relevant in an age where internet is accessed to a large degree through mobile devices [26].

In this study, the files were stored locally on the same computer as the application was run. Performance should also be measured in more realistic situations where data must be transferred across

11

https://hadoop.apache.org/

networks in various conditions. It is possible to simulate network conditions through for example Chrome DevTools. Another factor to take into account in a more realistic setting would be the server performance when many users access the application at the same time.

The dataset used in this study was static, meaning there was no new data arriving while the application was running. Handling live data would lead to new challenges to the performance of the application, in terms of cleaning, processing, storing and displaying data efficiently. Visualization of live data is a powerful tool that can be useful for example in anomaly detection, and finding efficient ways of creating such visualizations is a popular research topic.

Conclusions

In this study, it has been shown that there are striking differences in the performance of a visualization application depending on both the architecture of the system and the selection of tools used for the implementation. It seems clear that for an application based on a dataset of some size, a server-architecture implementation with a server written in Node.js may achieve the best performance out of the implementations tested here. To some extent there is a trade-off between longer page loading time and shorter response time to user actions. For a small dataset that can be transferred across the network in a short amount of time and processed efficiently, it is likely beneficial to choose a pure client architecture due to the short response times, which will make the page seem very responsive. However, when the dataset grows large, the benefits of shorter response times will not outweigh an unreasonably long page loading time, and the issues associated with loading a large amount of data into the browser m emory will be severe enough that this strategy is undesirable.

The study also highlights the importance of selecting programming languages and tools that lead to good performance. The comparison between the Node.js server and the Flask server showed that these implementation details can lead to marked differences in performance, and an informed choice may have a notable effect on the application.

7 Acknowledgments

I would like to thank MindRoad AB for the opportunity to perform this project. A special thank you to my supervisor at MindRoad AB, Åsa Detterfelt, for excellent input and support. I would also like to thank Jody Foo and Peter Dalenius as well as my classmates at Linköping university for valuable comments throughout the project. Finally, I would like to thank Anders Petersson for his insightful advice and continuous encouragement.

(12)

- 10 -

8 References

[1] Dobre, C., & Xhafa, F. (2014). Intelligent services for big data science. Future generation computer systems, 37, 267-281.

[2] Sivarajah, U., Kamal, M. M., Irani, Z., & Weerakkody, V. (2017). Critical analysis of Big Data challenges and analytical methods. Journal of Business Research, 70, 263–286.

[3] Pang, C. (2018). Spatiotemporal Visualization Literature Review. [4] Chen, W., Huang, Z., Wu, F., Zhu, M., Guan, H., & Maciejewski, R. (2018).

VAUD: A Visual Analysis Approach for Exploring Spatio-Temporal Urban Data. IEEE Transactions on Visualization and Computer Graphics, 24(9), 2636–2648.

[5] Wu, R., Painumkal, J. T., Randhawa, N., Palathingal, L., Hiibel, S. R., Dascalu, S. M., & Harris, F. C. (2016). A New Workflow to Interact with and Visualize Big Data for Web Applications. 2016 International Conference on Collaboration Technologies and Systems (CTS), 302–309.

[6] Kemer, E., & Samli, R. (2019). Performance comparison of scalable rest application programming interfaces in different platforms. Computer Standards & Interfaces, 66, 103355.

[7] Åkesson, T., & Horntvedt, R. (2019). Java, Python and Javascript, a comparison (Dissertation). Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:hkr:diva-20007.

[8] Mashey, J. R. (1998). Big Data and the Next Wave of InfraStress Problems, Solutions, Opportunities. In 1998 USENIX Annual Technical Conference, Invited Talk, Marriott Hotel, New Orleans, Louisiana (pp. 15-18).

[9] Nguyen, T. L. (2018). A Framework for Five Big V’s of Big Data and Organizational Culture in Firms. 2018 IEEE International Conference on Big Data (Big Data), 5411–5413.

[10] Chen, C. P., & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences, 275, 314–347.

[11] Nah, F. F. H. (2004). A study on tolerable waiting time: how long are web users willing to wait?. Behaviour & Information Technology, 23(3), 153–163. [12] Stringam, B., & Gerdes, J. (2019). Service gap in hotel website load

performance. International Hospitality Review, 33(1), 16–29.

[13] An, D. (2018). Find out how you stack up to new industry benchmarks for mobile page speed. Retrieved from

https://www.thinkwithgoogle.com/marketing-resources/data-measurement/mobile-page-speed-new-industry-benchmarks/.

[14] Wang, Z., & Phan, D. (2018). Using page speed in mobile search ranking. Retrieved from https://webmasters.googleblog.com/2018/01/using-page-speed-in-mobile-search.html.

[15] Manhas, J. (2013). A Study of Factors Affecting Websites Page Loading Speed for Efficient Web Performance. International Journal of Computer Sciences and Engineering (IJCSE), 1, 32-35.

[16] Wang, X. S., Krishnamurthy, A., & Wetherall, D. (2016). Speeding up web page loads with shandian. In 13th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16) (pp. 109-122).

[17] Nejati, J., & Balasubramanian, A. (2016). An In-depth Study of Mobile Browser Performance. Proceedings of the 25th International Conference on World Wide Web - WWW ’16, 1305–1315.

[18] Wagner, J. (2019). Lazy Loading Images and Video. Retrieved from https://developers.google.com/web/fundamentals/performance/lazy-loading-guidance/images-and-video.

[19] Jern, M. (1999). “Thin” vs.“Fat” Visualization Clients. In Digital Convergence: The Information Revolution (pp. 159-173). Springer, London.

[20] Janicki, J., Narula, N., Ziegler, M., Guénard, B., & Economo, E. P. (2016). Visualizing and interacting with large-volume biodiversity data using client– server web-mapping applications: The design and implementation of antmaps.org. Ecological Informatics, 32, 185–193.

[21] Wu, R., Painumkal, J. T., Randhawa, N., Palathingal, L., Hiibel, S. R., Dascalu, S. M., & Harris, F. C. (2016). A New Workflow to Interact with and Visualize Big Data for Web Applications. 2016 International Conference on Collaboration Technologies and Systems (CTS), 302–309.

https://doi.org/10.1109/CTS.2016.0063

[22] Wessels, A., Purvis, M., Jackson, J., & Rahman, S. (2011). Remote Data Visualization through WebSockets. In 2011 Eighth International Conference on Information Technology: New Generations, 1050–1051.

[23] Atluri, G., Karpatne, A., & Kumar, V. (2018). Spatio-Temporal Data Mining: A Survey of Problems and Methods. ACM Computing Surveys, 51(4), 1–41. [24] Compieta, P., Di Martino, S., Bertolotto, M., Ferrucci, F., & Kechadi, T. (2007).

Exploratory spatio-temporal data mining and visualization. Journal of Visual Languages & Computing, 18(3), 255–279.

[25] Berkely Earth. (2017).Climate Change: Earth Surface Temperature Data, Version 2. Retrieved from https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data.

[26] Statista. (2020).Percentage of mobile device website traffic worldwide from 1st quarter 2015 to 4th quarter 2019. Retrieved from

https://www.statista.com/statistics/277125/share-of-website-traffic-coming-from-mobile-devices/.

[27] Persson, M. (2020). A survey of methods for visualizing spatio-temporal data – Visualizing space and time.

[28] Kelton, C., Ryoo, J., Balasubramanian, A., & Das, S. R. (2017). Improving user perceived page load times using gaze. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17) (pp. 545-559).

[29] Lei, K., Ma, Y., & Tan, Z. (2014). Performance Comparison and Evaluation of Web Development Technologies in PHP, Python, and Node.js. In 2014 IEEE 17th International Conference on Computational Science and Engineering, (pp. 661–668).

[30] pandas. (n.d.). About pandas. Retrieved from https://pandas.pydata.org/about/index.html

[31] Zhang, Y., Ren, J., Liu, J., Xu, C., Guo, H., & Liu, Y. (2017). A Survey on Emerging Computing Paradigms for Big Data. Chinese Journal of Electronics, 26(1), 1–12.