Personalized visualization of blog statistics

(1)

Department of Science and Technology Institutionen för teknik och naturvetenskap

LiU-ITN-TEK-G-13/005-SE

Personlig visualisering av

bloggstatistik

Tina Durmén Blunt

(2)

LiU-ITN-TEK-G-13/005-SE

Personlig visualisering av

bloggstatistik

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Tina Durmén Blunt

Handledare Patrik Lundblad

Examinator Jimmy Johansson

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Abstract

This report documents the research, implementation and result for a master thesis in Media Technology and Engineering at Link¨oping University. The aim of the project was to develop a personalized visualization application of blog statistics to be implemented on a web based community for blog authors. The purpose of the application is to provide the users with a tool to explore statistics connected to their own blog. Based on a literature study in usability and information visualization the application design was developed and implemented. The implementation resulted in a JavaScript based application, BlogVis, that allows the users to compare their own blog statistics with others, as well as compare periods of time in the statistic history of the blog.

(5)

Chapter 1 Introduction

1.1 Background

Social media is a large source of information for people and companies all over the world. Among Facebook[1], Twitter[19] and Instagram[11] blogs play an important role of connecting and spreading information to many people. In the business world blogs are a new and powerful way of promoting company brands to expand the customer network.

Twingly[3] is a data mining company that focuses on indexing blogs all over Europe. Apart from distributing raw data to data customers, Twingly also provides companies with solutions to encourage blog authors to write more about their brand and products.

Among services connected to the blog index, Twingly also provides the blog community Bloggportalen.se[2]. The aim of Bloggportalen is to gain a closer relation and a more direct information channel to the blog autors. Bloggportalen is a community for Swedish blog authors and provides many different services including top-lists, blog search and a statistic widget for the registered blogs. The statistic widget registers the number of visitors for a blog, but the information is only utilized on Bloggportalen for displaying the number of visitors from the previous week. Thus, Twingly wishes to make use of the stored visit statistics in a more extensive manner than what is currently used.

1.2 Objectives

For Twingly to expand and to preserve their current customers, it is crucial that the blog authors keep writing blog posts and share links and recommen-dations. Without the writers activity and involvement, Twingly’s products loose efficiency.

For the blog authors, Twingly’s services are a way of gaining readers. However, currently there is no feedback given on how the usage of Twingly’s

(8)

services is affecting the number of visitors to a blog. By giving the blog authors a versatile tool for exploring blog statistics connected to their own blog, Twingly aims to increase the linking and blog writing. The knowledge that this application provides will hopefully encourage the blog authors to be more active in order to improve their blog statistics. The developed statistics visualization application should in a simple, fun and accessible way give the user the possibility to explore information; both to compare other blogs as well as to allow the user to explore the statistic history of their own blog.

1.3 Basic conditions

For the application to be both accessible and customized for each blog, it should be web based and located on the profile of the user section on Bloggportalen. Currently there are over 100,000 users on Bloggportalen and therefore a large amount of data is stored. The aim is to, by performing data filtration and selection as well as applying information visualization strategies, visualize the data in a way so that the user can turn it into knowledge.

Furthermore, the blog authors represent a wide user group with varying computer knowledge, age, social background and education. The application should strive towards being understandable for, and appeal to, the entire spectra of the user group.

1.4 Limitations

Because the project have a given time frame, the final functionalities of the application has to be restricted. Specifying this in the beginning of the project will facilitate keeping focus, choosing which implementation steps to prioritize and to meet the time limit.

Aspects to include in limiting the project:

− Only use three filtering options and adapt the application to them. − The application is going to be web based and mainly developed for

Google Chrome. Application development for other browsers, tablets and smart phones are out of the scope for this project.

− The focus is on concept design, interaction design and application im-plementation. The data used in the application will be extracted from already consisting databases and API functionality on Bloggportalen and Twingly. Visualization components that are demonstrating statis-tics that are not currently stored in the databases will use local test data, for example geographic data.

(9)

− Because of time constrains, usability tests and direct communication with the user group are excluded.

1.5 Target group

This thesis is written for a target group with previous knowledge in visual-ization, programming and mathematics. The reader is expected to have a background in computer science.

1.6 Thesis outline

This thesis will explore the application design, implementation and result of developing a web based information visualization application from the previously given conditions. The structure of the following chapters of this thesis:

• The Theoretical Background chapter gives introduction to infor-mation visualization and to the usability aspects in application design. • In The development process chapter, some methods in the work-ing methodology and supportwork-ing tools for the concept design are doc-umented.

• In the BlogVis Prototype chapter, the concept design process is documented. The application structure is motivated and the selected visualization components’ functionality are explained.

• The Implementation chapter gives an insight to the code architec-ture and programming languages chosen for the implementation. Fur-thermore, the extraction of the data and the working methodology are explained.

• In the Discussion section of this thesis, the results of the project are discussed. Moreover, possible future work are also documented.

(10)

Chapter 2 Theoretical Background

2.1 Perception

Evolution has given the human vision the ability to fast select and deselect information from what the eyes detect. Not everything can be processed by the human brain, therefore, most of what eyes detects is discarded before it reaches our awareness[17]. By using colours, shapes, structures and anima-tions the user’s attention can be guided to what is important for the task. For example, making the user distinguish important elements from others or to group elements together and give a uniform shape which gives an over-all perspective[9].

2.1.1 Preattentive processing

Figure 2.1: Preattentive processing is a selection process that the human vision

automatically performs when the eyes send visual information to our brain. By using color, shape or position the user’s focus can be guided by using preattentive processing principles.

Preattentive processing is a step in the selection process of which ele-ments that is going to be focused on and perceived by our brain. Preat-tentive processing occurs if the brain registers an object faster than ap-proximately 250 milliseconds. When the brain performs this selection the

(11)

object is perceived before the actual selection is performed by our continu-ousness. Therefore we experience the preattentive process as an action that is automatically performed by the vision. This theory describes how some elements easily can be distinguished from others by their colour, shape or orientation[9], see figure 2.1. Preattentive processing is an important aspect to keep in mind when developing a user-centred application.

2.1.2 The Gestalt laws

The principle of the Gestalt laws describes how the human vision sees and interprets a group of elements. Unconsciously the human brain tries to group the elements by their placement and shape. Furthermore the brain automatically fills in information to create shapes or context. In figure 2.2, the brain is operating according to the Gestalt laws. In the first section the brain creates lines of the items based on their shape. In the middle section the items get divided into groups, and in the third section the brain fills in information and creates a circle and a square, from what actually is only one object[9].

Figure 2.2: The principle of the Gestalt laws describes how the human vision

interprets a group of elements. The brain automatically creates patterns, groups or fills in information where it is needed. In image 1 the vision create lines from the different shapes. Image 2 illustrates how the brain group the circles together depending on their position. In image 3 the brain fills in information. We see two different objects, a circle and a rectangle, from what actually is one object.

2.2 Information visualization

Storing large amounts of data today is not a problem, the issue more often lies in managing and interpreting the stored information. To visualize the data is one way of using and analysing it. Information visualization is the study of identifying the correct methods and models to turn data into re-liable and comprehensible knowledge[9]. The aim is to support the user in finding context in large quantities of data. By providing the right tools to perform a specific task the data can be set into knowledge more effectively and accurately by using information visualization applications.

(12)

2.2.1 The information visualization pipeline

Figure 2.3 shows the visualization pipeline. When creating a visualization application, all steps in the pipeline has to be developed and connected[18]. Initially the raw data has to be processed. This means that it has to be stored, sorted and organized in a way so that it easily can be loaded and used in the visual mapping stage. Filtering, selection and data mining can also be performed on large quantities of data to reduce or cluster it. In the visual mapping stage the data is being represented by different shapes and structures for the visual display, which is the interface for the user to interact and interpret the data representations.

Figure 2.3: The visualization pipeline demonstrates all the steps in the

visualiza-tion process. When transforming the raw data into data tables the visual mapping can be performed. In the visual mapping stage the data tables are represented by graphic shapes in different visualization components. This results in the visual dis-play which is the user interface.

2.2.2 Data

Figure 2.4 illustrates the dataset terminology that will be used in the data visualization contexts in this report. The items in the dataset represents a specific object, for example a blog. The variables are the data values connected to each object. The items can be represented by many variables, for example a blog can have number of visitors, geographic position and category. The visualization components often display all items for a variable. A dimension represents item values for one variable, which can consist of one of the three different kind of data types: numeric, ordinal and nominal[15]. Visual components often differ in the number of dimensions they display. Figure 2.6a shows a one-dimensional chart. There is only one variable that determines the height of the bar. Figure 2.6b shows a multidimensional scatter plot that displays four dimensions of data. The x- and y-position, the size and the color of the dots are each determined by a variable from the dataset.

A multidimensional char contains more information and dependencies between variables. Although the more dimensions a chart displays the more

(13)

Figure 2.4: The dataset in the figure demonstrates the terminology for objects and

their values. An item is a specific object, for example a blog, and the variables are different values an object holds. A variable in the dataset can be represented by one data-type: ordinal, numeric or nominal.

complex the visualization becomes. When displaying a large number of dimensions in one component a higher level of analytic skills are required from the user.

2.2.3 Focus and context

When developing an application for a large dataset the aim is often to show the part of the dataset that is most important for the user to perform the task. When focusing on just a part of the information the overall under-standing can be lost and the usability often suffers a degeneration. However, when showing only the overall view, the relevant detailed information never gets explored by the user, see figure 2.5a.

The meaning of focus and context[15] is to display the part of interest in high detail and the surrounding parts in a low detailed context view. This allows the user to get access to high detailed information of the dataset, and still not loose the context of the parts in focus. Figure 2.5b illustrates when focus and context is applied on the previous figure 2.5a.

2.2.4 Visualization components

There are many existing visualization components today that are being used frequently in information visualization[15]. All having their own strengths and weaknesses and supporting different kind of data types.

(14)

(a) No focus and context applied.

(b) Focus and context applied.

Figure 2.5: No focus and context is applied in figure (a), and the attention of

the user is more difficult to guide. When using the principle of focus and context the interesting part of the dataset can be displayed in high detail and still keep the context view, see figure (b). This principle supports the user in perceiving the most useful information.

(a) One-dimensional. (b) Multi-dimensional.

Figure 2.6: Information visualization components can demonstrate a varying

number of variables, also referred to as dimensions. In figure (a) a one-dimensional chart is illustrated. One variable from the dataset is affecting the rectangle, in this case the height. In figure (b) a multi-dimensional chart is illustrated. There are four variables determining the x- and y-position, colour and size of each dot.

(15)

Figure 2.7: The bar chart is an one dimensional visualization component. An

item in the dataset is represented by a rectangle and the variable y defines the height of the rectangle.

Bar chart

A bar chart, see figure 2.7, is a one-dimensional visualization chart where the height of the bar represents the value of a variable for one item in the dataset.

Table lens

The table lens, see figure 2.8 displays a large number of bars representing one variable of the dataset. An application often displays several table lenses side by side, to be able to show multiple dimensions of the data. To be able to display higher detail in the table lens, it is often used in combination with the fish eye technique[15]. The fish eye technique, see figure 2.8, lets the user select areas in the table lens which should be displayed in higher detail, see figure 2.8.

Scatter plot

The scatter plot, see figure 2.9, displays at least two variables of the dataset by positioning shapes in a two dimensional space. By using colouring, size and different shapes, a larger number of dimensions can be displayed by the plot.

Pie chart

The pie chart, see figure 2.10, displays the data items in a circle. By varying the angle and radius of an item the pie chart can display a large number of dimensions. Layering different sized circles together in the pie chart pro-duces a graph that can display even more dimensions.

(16)

Figure 2.8: The table lens is a one-dimensional visualization component where

each data item is represented by a bar where the width is determined by a variable value. The table lens displays a large amount of items, either horizontally or ver-tically positioned. When combining the table lens with the fish eye technique, focus and context are applied to the chart which allows the user to get more information on selected items.

Figure 2.9: The scatterplot is a multi-dimensional visualization component where

each data item is represented by a circle. Two different variable values affect the x- and y-position of each item. To add more dimensions size, colour and different shapes can be implemented. In figure 2.9 the item colour does not add a dimension since the dots are coloured based on the y variable value.

(17)

Figure 2.10: The pie chart displays each data item as a part of a circle. The

variable values can affect the radius and angle of each item. In figure 2.10 one variable value is affecting each part’s radius.

Figure 2.11: The geographic display allows the user to compare variable values

between different geographical areas. By using colour scales, regions with high values can be distinguished from the ones with lower variable values.

Geographic display

By using geographical display charts, see figure 2.11, the variables in a dataset connected to a geographical area can be displayed. This gives the user the ability to compare different areas.

2.3 Usability

Since the purpose of information visualization applications is to extract knowledge from the data, it is important to have a user-centred approach in designing the application[4]. A visualization looses efficiency if the user is not able to comprehend the navigation, structure or correlations in the visual display. The general needs, interests and capabilities of the user have to be taken into account when developing an application. Furthermore, as-pects like age, country, culture, gender, training, perceptual and cognitive skill level or motivation are important when creating an application adapted for the user[9].

(18)

2.3.1 Dashboard design

In creating an information visualization application many charts and com-ponents are usually combined in the user interface. The placement of the components and their functionality should enhance the usability and the user interaction, individually as well as the connections between them. By using the human vision and perception the user’s attention can be guided to desired focus areas and actions[17].

Normans usability guidelines:

• Visibility. Allow the user to work out the current state of the system and the range of actions possible.

• Provide Feedback. Give continuous, clear information about results of action.

• Present a good conceptual model. Allow the user to build up a true picture of the way the system holds together, the relationship between its different parts and how to move from one state to another.

• Offer good mapping. Aim for clear, natural relationship between actions the user performs and the results they achieve.

The following list states points to keep in mind when developing the mapping of the application [12][4]:

– Consistency- The functionality of the visualization charts should give the same response to the actions of the user throughout the entire application. The actions should work in an expected way that will not irritate the user.

– Coherence - To demonstrate all individual elements based on a cohesive design, which will clearly show what belongs to the application.

– Information placement- The positioning of the information is crucial for what is perceived by the user.

– Colour - The colour can be used to enhance coherency and it is a powerful tool to distinguish specific items and make less impor-tant ones blend into the background.

– Text clarity - If the text is not displayed in a clear way it will lower the user experience or the information may not even reach the user at all.

(19)

Chapter 3 The development process

3.1 Agile

When implementing the charts in the application the agile working method-ology was used[14]. This allows dynamic work flow and makes the planning less sensitive to changes and additional functionality implementation. A product backlog was used to list and rate the small tasks to be implemented. Every week, task items were selected from the product backlog to be rated and time estimated in a sprint. This captured and collected all updates and changes that appeared along the implementation stage and made it more effective. Due to the item rating the aim of the application’s functionality was kept in focus.

3.2 Persona

Besides adapting the application to general usability guidelines (section 2.3) the balance of the analytic complexity has to be considered. A user interface is differently interpreted depending on who the user is. This has to be taken into account when developing applications to be explored by a user group. For example, a weather forecast map has to be designed depending on if it is a meteorologist or someone without deeper knowledge within the area using it. Because of the previous knowledge and experience of the expert the complexity can be on a higher level while the inexperienced user will demand a lower complexity level.

When developing the application, the user group has to be considered. To achieve the best results in the level of analytic complexity, the application should be developed for the average user. A persona can be used, which is a fictitious person that represents the user group. In this case, the user group is wide and contains a large variety in all areas. The persona was chosen based on the current largest categories on Bloggportalen.

(20)

The persona used for the blog visualization: • Age: 19 years

• Gender: Woman • Country: Sweden • Education: High-school • Computer habit: Average • Knowledge in statistics: Basic

• Interests: Fashion, fitness, beauty, party, social events • Motivation in usage of the statistic application:

– Compare the own blog of the user to others and also be able to look at the most visited blogs.

– Be able to explore other blogs similar to the own and similar to the favourite blogs.

– Learn how the visits have increased or decreased over time since the blog started.

3.3 Lo-Fi prototypes

To support both the planning of the code structure and the selection of vi-sualization charts, Lo-Fi prototypes are a useful tool. To design a prototype in the early stages of the project prevents implementation of functionality that will not be used in the final application. Furthermore, the prototypes can be used for early user tests or as a support for feedback before starting the implementation stage. In figure 3.1, the final Lo-Fi prototypes for the visualization application are shown. The application was implemented with these prototypes as templates. Some functionality and chart design was changed along the implementation but the main structure was kept.

(21)

(a) First application view.

(b) Second application view.

Figure 3.1: Lo-Fi application prototypes were designed in the early stages of the

project to facilitate the implementation process. In this stage of the application de-sign, the focus was on concept dede-sign, hence details in functionality and graphics were concentrated on later in the application design process. The Lo-Fi prototypes allow changes and are a good way of getting a general understanding of the con-cept, which is a good base for brainstorming and structuring the charts. The Lo-Fi prototypes in figure 3.1 represents the two main views of BlogVis.

(22)

Chapter 4 BlogVis Prototype

Based on the objectives, knowledge and interests of the persona (section 3.2) and the Lo-Fi prototypes (section 3.3) the visualization application BlogVis was designed. The structure of BlogVis is divided into two main views: the comparison view (figure 4.1a) and the personal view (figure 4.1b).

In the comparison view the users can compare their own blog to others. Filtering options are also available to display the most visited blogs as well as ones similar to the blog of the users. The users can also look more closely at blogs in the same category and the same geographical area as their own blog.

In the personal view the users can see the development of their own blog statistics in the history of the blog.

4.1 Design choices

When designing BlogVis there are many aspects to be considered. All design choices should support the usability (section 2.3) and facilitate the user interaction and perception(section 2.1) in performing the task. To make the application coherent, colours and shapes where chosen to be resembling throughout the entire application. Moreover, besides following Norman’s usability laws (section 2.3.1) these specific steps in increasing usability were taken into account[16]:

• Create a clear visual hierarchy. Using menus and sub-menus helps the user not to feel confused when navigating in the application. A clear hierarchy shows what belongs to every part of BlogVis, as well as how to get back or forth in the interaction history. This will portray a good conceptual model and good mapping. The choice was made to have a clear main menu, see figure 4.2, and to have distinct buttons that would clearly portray which options the user has activated.

(23)

(a) In the comparison view the users can compare their own blog to others. Filtering

options allow the users to select blogs in one category or region as well as provide the users with the option to explore and compare blogs on the top-list.

(b) In the personal view the users can explore the statistics in the history of their

own blog. The option of focusing on blog posts, geographic position of the reader or visit statistics are provided for the user.

Figure 4.1: The BlogVis application allows the users to explore and compare other

blogs registered on Bloggportalen (figure 4.1a) or to look at the statistic history of their own blog (figure 4.1b).

(24)

Figure 4.2: To demonstrate a clear visual hierarchy in the application a main

menu was designed to inform the user of which view is active and how to navigate between them.

Figure 4.3: For BlogVis to be able to guide the focus of the user and demonstrate

a clear chart structure the visual noise was kept down and the page was broken up into clear areas containing each chart.

Figure 4.4: In guiding the user interaction it is important to clearly show the user

which objects in the application that is interactive. Changing colour and mouse cursor when hovering over objects, informs the user about which action is possible for the highlighted object.

(25)

• Keeping the visual noise down and breaking up the page into clearly defined areas makes the application easier to comprehend. It makes each component stand out as well as it increases the ability to get an overview of the structure of BlogVis. The choice was made to frame each component with a blue square in the background, see figure 4.3. Furthermore the background was kept simple in a solid white color to reduce the visual noise.

• Making obvious what is clickable. All elements that are interactive should signalize this, otherwise some of the functionality will never be explored by the user and the application looses efficiency. By using colour changes and changing the mouse cursor, the user was guided and informed of the possible actions for the object in focus. Figure 4.4 demonstrates the different cursors used in BlogVis and their associated interaction.

4.2 The application size

To make it easier for the user the scrolling functionality for the browser was excluded. The scrolling functionality does not support the user in com-pleting the task, neither does it help the overview to only display parts of the application and enable a scroll option. Instead the BlogVis application size was adapted to the window size of the browser. This is going to cause problems when the computer screen is too small or when the browser win-dow is minimized. If the height and width have a rare ratio the application structure will differ significantly, and the application will loose usability. In this case the advantage of excluding the browser scroll was more preferable than having a static application size. The majority of the users will never experience the usability issue with rare screen ratios.

4.3 Visualization components

The application should display as much information as possible, without it being to complex for the user to comprehend and analyse the charts. BlogVis was therefore adjusted to ability to interpret graphs and the es-timated knowledge of the user. Therefore multidimensional visualization components (subsection 2.2.2) are excluded. By using many different com-ponents, clearly connected to each other and displaying a low number of dimensions each, the analytic complexity of the application is minimized.

(26)

Figure 4.5: The filtering component was implemented to allow the user to select

which blogs to explore in the other components of the comparison view. The provided filtering options are total, category and geographic region as well as the choice to compare blogs similar to the blog of the user or to look at the top-list blogs on Bloggportalen.

4.4 The comparison view

The aim of this part of BlogVis is to display an overview of the blog world from the perspective of the user. Both regarding the blogs closest to the blog of the user and to the most visited ones. The statistics for each blog item is based on the number of visitors, links and blog posts from previous month.

A filtering function, see figure 4.5, was designed for the user to select inter-esting parts of the data that should be further explored in the application.

The dataset contain one hundred specific blogs that are selected depend-ing on which filter values that are active. These blogs are then demonstrated in the table lens component (section 2.2.4) and in the scatter plot component (section 2.2.4).

The table lens, see figure 4.6, gives the user the ability to get a quick overview of the blog data in a one-dimensional chart. The width of the bars represents the number of visitors the previous month. The fish eye technique is also applied in the chart to give the ability to hover the mouse pointer over the elements and get more information.

The scatter plot (section 2.2.4) is used to display the number of visitors as well as the number of incoming links for the blogs, see figure 4.7.

(27)

Figure 4.6: The table lens demonstrates the number of visitors for all blogs in

the dataset. The width of the bars is determined by the number of visitors previous month for each blog. To highlight the user object and the selected object preattentive processing (subsection 2.1.1) is used in the colouring, orange for the user and purple for the selected object. The fish eye technique (section 2.2.4) is implemented by a blue arrow that follows the mouse pointer when hovering over the chart.

Figure 4.7: The scatter plot is used to display the number of visitors as well as

the number of links for a blog. To get more information about a blog, the user can select the item of interest by clicking on the object in the scatter plot. The user is also provided with the option to show the plot with linear or logarithmic scaled axis.

(28)

(a) Linear scale. (b) Logarithmic scale.

Figure 4.8: Figures (a) and (b) demonstrate the difference in the positioning of the

dots in the scatter plot when linear or logarithmic scale is selected. The linear scale is used initially in the application to avoid giving the user an inaccurate general perspective of the blog values. To increase visibility and distribute the dots in the two-dimensional space, the logarithmic scale was provided as an option for the user.

Figure 4.9: In the bar chart two object selected in the other charts are displayed

and compared as bars with varying heights. The two selected blogs are positioned in pairs for three different variables: number of blog posts, visitors and links.

A problem that occurred was that the majority of the dataset items had values that positioned them closely together in the two dimensional space in the scatter plot, see figure 4.8a. This resulted in visibility issues and made it more difficult for the user to distinguish individual blogs.

The solution to the visibility issue is to apply logarithmic scales to the scatter plot axis, which places the items more distributed in the two dimen-sional space, see figure 4.8b. The logarithmic scale is however misleading for the inexperienced user. The values can easily be misinterpreted, especially for the low value items that visually seem to have higher variable values than they actually do. Therefore the linear scale is used initially, to give an accurate overview, and the user is provided with the option to change the axis to logarithmic scale.

The bar chart, see figure 4.9 and section 2.2.4, allows the user to compare two specific blogs. The height of the bars are represented by three variable values. For each variable two bars are set side by side, one for each blog. The Gestalt law supporting object grouping (subsection 2.1.2) were used to separate the three different variables.

(29)

Figure 4.10: In the interaction of the comparison view the charts are connected

in a hierarchical way. The figure shows how the charts are affecting each other. Initially the filter options, the red circles, defines which blog items the scatter plot and the table lens should demonstrate. Secondly the user selects which objects, the green circles, that should be displayed and compared in the bar chart, the green square.

4.4.2 Interaction

The visualization components are connected in a hierarchical way, see figure 4.10, each affecting other parts when the user performs an action. The filter component affects the data shown in the other charts. Furthermore the table lens and the scatter plot affects which two bars are displayed in the bar chart.

Preattentive processing (subsection 2.1.1) is used in the colouring to highlight the two elements in focus. These two colours were selected because they stand out from the rest of the elements, which are coloured in different shades of blue. The colours are used throughout the entire comparison view to clarify the connection of the selected item in all components. When selecting an item in the table lens or scatter plot the bar chart component displays an animation to draw the attention to the comparison between the bars. This gives the user clear feedback of what the selecting action leads to and where to focus the attention.

4.5 The personal view

In the personal view, only statistics connected to the blog of the user is displayed. This gives the opportunity to show information over time which allows the user to explore and compare different periods in the history of the blog.

(30)

Figure 4.11: In the personal view, focus and context was applied. The aim is to

give the user a general view of the statistic history of the blog as well as allow the user to look at different periods of time in a higher detailed view.

Figure 4.12: The overview time graph displays the entire time span from the first

day of the blog to today. The aim of this chart is to give the user a quick general perspective of the development of the blog over time. The vertical axis represents average number of daily visitors. The time graph also provides a slider functionality that allows the user to select a time period to be shown in the other charts.

4.5.1 Structure

The aim for the structure of the personal view is to incorporate focus and context (subsection 2.2.3). If the time axis is based on information from the initial date of the blog to the current date, a lot of information is expected to be displayed in one view. Therefore the application was divided into three different sections, each displaying a different level of detail, see figure 4.11. This was implemented in order to give an overview and enable the user to get more information by selecting a time interval of interest.

To allow the user to explore the development of the blog over time, two line graphs are used. The line graphs demonstrates the number of visitors over time. The low detailed overview time diagram shown in figure 4.12 was designed to provide an overview. The graph is divided into three intervals: years since the blog started, the past 12 months and the past 30 days. The purpose of splitting the intervals is to show periods that are close to present time in higher detail.

(31)

Figure 4.13: The high detailed time graph displays more information for a selected

time period. Nine different time items, selected by the slider in the over-viewing time graph, are represented by an arrow shape in the time graph. The user can select an item in the chart to explore in a higher detailed display. This item will always be in the center of the high detailed time graph.

Figure 4.14: The thermometer graph is a specialized component designed to

demonstrate the values of the selected time item, as well as compare it to the max-imum values achieved in the different intervals in the statistic history of the blog.

The high detailed time graph, shown in figure 4.13, displays a higher detailed view of a chosen interval and enables the user to chose a time period that will be further explored.

When looking at a specific time period the aim is to give the user an estimation of the blog activity for the selected interval. A specialized compo-nent was designed, that takes advantage of the association skills of the user. The thermometer component, see figure 4.13, displays the selected item in comparison to the maximum value in the past days, months or years. The highest value of all items within an interval is represented by a thermometer that is filled to the top. The component is used for displaying visitor count, number of blog posts and average ratios for the selected time item.

The geographical display, see figure 4.15 and section 2.2.4, shows the location of the blog visitors. It contains zoom and rotation options for the user to be able to navigate in the map. The pie chart, see figure 4.16 and section 2.2.4, displays all blog posts written in a specific year. The radius of each part of the chart displays the number of blog posts written in a specific month of the selected year.

(32)

(a) Geographical display.

(b) Zoomed in geographical display.

Figure 4.15: The geographical display demonstrates the blog location of the reader.

The size of the red dots represents the number of readers in the area. The user is provided with zooming and rotation options to allow navigating in the map.

Figure 4.16: The pie chart demonstrates the number of blog posts for each month

in a specific year. The radius of each part of the circle is determined by the number of blog posts in that month.

(33)

Figure 4.17: The personal view of BlogVis is divided into three parts that are

connected by the user interaction of the entire view. The slider in part (A) deter-mines the time items shown in part (B). Furthermore the selected item in part (B) determines what information to be displayed in part (C).

4.5.3 Interaction

The three parts of the application are connected and the actions performed by the user affect all components.

(A) The overview time graph, (A) in figure 4.17 is affecting the high detailed graph, (B) in figure 4.17. Its purpose is to show the entire time spectra of the blog and to allow the user to easily select different periods by using the slider.

(B) By selecting an item in the high detailed graph(B) the user affects the information sections, (C) in figure 4.17, above it. It portrays the time items which the user can select to explore in the information section. (C) The information section is where a selected item is being represented by

different graphs and charts. The user can select between four different views: key values list, geographic display, thermometer graphs and blog post titles. An arrow option is also available for the user to select the previous or next time period in the time graph.

(34)

Chapter 5 Implementation

5.1 Extracting data

The first step in the visualization pipeline (subsection 2.2.1) is to collect and transform raw data. When developing web based public applications it is preferable to protect the database connection. The data in BlogVis is therefore retrieved through the API of Bloggportalen and Twingly.com. By using an URL with function title and in-parameters a JSON-variable (figure 5.2) is returned and used in the application. The data extraction is written in Python[8] that has built in mySQL query functionality[6]. Figure 5.1 illustrates the data extraction pipeline, from the client side to the back-end server side and the returning data for each step.

Bloggportalen does not contain all information needed in the application, like links and number of blog posts, and therefore the search API of Twingly was used for retrieving this information. A problem that occurred was that the blogs do not have a joined variable key on Bloggportalen and Twingly, like an unique id, which could connect one blog in each database. Therefore the URL of the blog was selected for this. From the data extraction in Bloggportalen’s database the URL for each blog is extracted and used in the search API of Twingly as search keys. Unfortunately some blogs have an URL that only redirects to another. Twingly’s systems register the URL that the user is redirected to, while Bloggportalen stores the URL that the user specifies, which often is the first one. This leads to search errors for the blogs with URLs that differ in the two databases. Since the URL was the only possible connection between the two databases there was no other solution to this problem other then providing the user with an error message and inform them to use the correct URL when registering the blog.

For the new functionality to be cohesive with previous API functionality a different type of JSON was returned, called JSONP. The built in D3 functionality for retrieving JSON does not support JSONP and therefore the data was retrieved with jQuery.ajax [13] on the client side.

(35)

Figure 5.1: Shows the process of extracting data from the API of Bloggportalen.

By using a URL containing function name and in-parameters, the correct methods are being called in the Python script on the servers of Bloggportalen. Furthermore the Python script executes a SQL query against the database, returning the blog data. On the client side a JSON variable containing the data extracted from the database, is returned and used in the visual display stage (subsection 2.2.1).

Figure 5.2: The structure of a JSON variable returned when using the API

func-tionality on Bloggportalen. The JSON variable represents the data table and is used in the visual mapping stage of the visualization pipeline (subsection 2.2.1).

(36)

One problem with gathering the JSON data was that the success function was executed before the JSON loading was done. This caused the visual-ization to be drawn on the screen before all the information was gathered. To prevent the application to draw the charts before the data was loaded, count variables, if-statements and nested functions was used.

5.2 Javascript toolkits

There are many different visualization tool-kits in JavaScript. In developing the Twingly-application the tool-kit should enable a large possibility of cus-tomization. Some toolkits were therefore excluded because they provided finished visualization charts with few options to change and adapt.

D3.js[5] is a JavaScript visualization tool-kit which is a further devel-opment on the Protovis tool-kit[10]. D3 provides a lot of functionality and customization possibility in the visualizations.

The programming outline of D3.js is to create SVG items, which is the scalable vector graphics format supported by most web browsers. Each SVG object is bound to a data item which specifies the graphical representation of the object, by using the value and index of the item[5].

The objects can be stored in an array which can be further used and changed in the application. This is helpful when connecting the different charts when only one item is to be selected and updated. There are many more attributes to specify for the items and also events such as mouse down, mouse over and mouse out. D3 also contains functionalities for axis, scales and animations. This gives the developer the opportunity to affect all parts of the design for the component, but which also leads to more coding before a finished component can be displayed.

5.3 Code architecture

Figure 5.3 shows the main structure of the code for BlogVis with the four main HTML files as well as the event handlers and the component classes.

Public classes that contain measurement values, color values and div containers made it easier to develop the charts in a cohesive way. Since the application’s size was inherited from the screen size it is important to store the measurement values and make them available for the entire application. Constant changes like moving charts and changing colors were facilitated by using this structure.

Each chart was represented by a class with functionality to be able to update the chart when a user interacts with the visual component. A sim-plified version of the observer design pattern[7] was implemented, see figure 5.4. For each section of BlogVis an event handler (the subject) was created,

(37)

Figure 5.3: The code structure of BlogVis. Each main menu option is represented

by a HTML-document which contain the JavaScript classes for the charts and the event handlers, as well as the colour and measurement value classes.

that focused on handling the connection between the visualization compo-nents (the concrete observers) and the user interaction. The event handlers’ responsibility is to notify the charts in the observer collection when the user performs an action and make them perform the update.

To prevent the application from heavy data reloading, avoid information storage with cookies and elude frequently redrawing the elements of the application were drawn in JavaScript functions. New HTML-documents were only used when the user toggled between the main menu options.

Figure 5.4: The observer pattern is a design pattern for the code architecture

used in the implementation of BlogVis. It is based on a subject class that contains instances of and notifies the concrete observers. In BlogVis the subject class is the event handler and the concrete observers are represented by each chart.

(38)

Chapter 6 Conclusion and future work

6.1 Conclusion

Since many companies choose to store data and keep record of user interac-tion, the data quantities are growing larger for each day that passes. The information visualization development has to follow the data quantity evo-lution in both data mining, component design and usability. Since every information visualization has different tasks, datasets and user groups there is no given template for an application design. Every application has to be customized and tailored for the user, data and task at hand. By following guidelines and inspiration from previous work an application can be de-signed, but in developing an information visualization the creative freedom for the developer is great and should be utilized.

In the application design documented by this thesis, I combined already known components in information visualization with new customized charts. Nowadays data visualizations often strive towards showing many dimensions and complex correlations, which makes it only available and useful for users in different areas of expertise. The challenge in designing BlogVis was in the opposite direction. The aim was to create a fun, informative and easy accessible application that could be used by people with less experience in information visualization. The challenge lied in creating something informa-tive yet simple enough from a large quantity of information. This demanded a lot of my own creativity and inventiveness, as well as a profound literature study. I found the importance in collecting knowledge from different areas of application design, as well as the challenge in finding the best possible way of combining the knowledge and the new ideas.

(39)

6.2 Future work

6.2.1 Code architecture

The code structure could have been further developed before the imple-mentation process begun. JavaScript has many built in functionalities to support usage of different design patterns. In this project the MVC (Model View Controller) pattern could have been used. MVC would make the code more structured, easier to update and adding more functionalities to, hence a study in the MVC pattern implementation could be performed.

6.2.2 Usability

For further adjustment to the user group, usability tests can be performed. To get feed back from the user group is a good way of learning if the user managed to perform the task and turn the statistics into knowledge.

The application is developed for Google Chrome and Safari which has caused usability problems in Firefox and Explorer. To get the application to work in all browsers a lot of extra development has to be done, which was not a priority in this project. To avoid problems and extra work for the user the application should work in all browsers. Furthermore the application can also be developed to work on tablets and smart phones.

6.2.3 BlogVis functionality

One additional functionality that would increase the user experience is to enable a search option for finding blogs in the dataset. The only current way to find specific blogs is by using the fish eye in the table lens or to mouse over the scatter plot dots. By enabling a search functionality the user can more efficiently find specific blogs to compare.

The filter functionality can also be further developed to provide the user with more options and allow the user to combine multiple filtration options. For example one blog often has multiple categories and the user should be able to chose which category to be further explored in the application. Other additional filtering options could be age, start date of the blog and subcategory.

To increase customer linking among the blog authors, a good idea is to give feed-back on how customer linking affects the number of visitors for a blog. This can be done by showing customer links and visitors in the scatter plot.

An implementation of a search functionality would improve the user experience in the personal view. In this section the search option could be focused on blog posts titles and blog post contents. There is no service on the market today that provides the user with the ability to search in the own

(40)

blog. This additional search option could provide the user with the ability to find an old blog post by searching the title or a content segment.

To allow a logarithmic option in the linear time graphs would give the user a better overview of the time graph, since it would make differences more obvious. A blog often have a steady number of visitors per day and the line graph often displays a more or less straight lined graph. If the scale was logarithmic the differences would be more obvious for the user.

The geographic display can be much more detailed and informative. For example to label the geographic areas and cities. Additionally, the level of detail should increase when zooming in on different areas. Furthermore the interaction of the globe should be further developed and usability tested.

The period where the highest variable values was achieved should be shown. Currently the thermometer chart compares the current value with the maximum value, but the maximum is never shown in a clear way.

To increase usability the user should be able to move the high detailed graph directly with the mouse pointer. Currently the only ways to move along the time axis is to use the slider under the graph or the arrows next to the information panels.

(41)

Bibliography

[1] Facebook AB. Facebook. http://www.facebook.com/, 2013. [Online; accessed 20-Feb-2013].

[2] Twingly AB. Bloggportalen. http://Bloggportalen.se/, 2013. [On-line; accessed 20-Feb-2013].

[3] Twingly AB. Twingly. http://www.twingly.com/, 2013. [Online; accessed 20-Feb-2013].

[4] Badre Albert. Shaping web usability. Addison-Wesley, 2002.

[5] Michael Bostock. Data-Driven Documents. http://d3js.org/, 2012. [Online; accessed 02-Jan-2013].

[6] Oracle Corporation. MySQL-The world’s most popular open source database. http://www.mysql.com/, 2013. [Online; accessed 02-Jan-2013].

[7] R.Johnson J.Vlissides E.Gamma, R.Helm. Design Patterns. Addison-Wesley, 1995.

[8] Python Software Foundation. Python Programming Language Official Website. http://www.python.org/, 2013. [Online; accessed 02-Jan-2013].

[9] Ellis Geoffey, Keim Daniel, Kohlhammer J¨orn, and Mansmann Florian. Solving problems with visual analytics. Eurographics Association, 2010. [10] Stanford Visualization Group. Protovis-a graphical toolkit for visualiza-tion. http://mbostock.github.com/protovis/docs/, 2010. [Online; accessed 20-Feb-2013].

[11] Instagram. Instagram. http://instagram.com/, 2013. [Online; ac-cessed 20-Feb-2013].

[12] Gary Marsden Jones Matt. Mobile Interaction Design. John Wiley and sons, 2006.

(42)

[13] The jQuery Foundation. jQuery ajax API. http://api.jquery.com/ jQuery.ajax/, 2013. [Online; accessed 02-Jan-2013].

[14] Schwaber Ken. Agile project management with Scrum. Microsoft Press, 2004.

[15] Spence Robert. Information Visualization. AMC Press Books, 2001. [16] Krug Steve. Don’t make me think. A common sense approach to web

usability. New riders publishing, 2006.

[17] Xurui Tan and Hao Tang. Inquiry into the shape design of display space based on visual perception. In Computer-Aided Industrial Design Con-ceptual Design (CAIDCD), 2010 IEEE 11th International Conference on, volume 1, pages 737 –741, nov. 2010.

[18] D. Tang, C. Stolte, and R. Bosche. Design choices when architecting vi-sualizations. In Information Visualization, 2003. INFOVIS 2003. IEEE Symposium on, pages 41 –48, oct. 2003.

[19] Twitter. Twitter. https://twitter.com/, 2013. [Online; accessed 20-Feb-2013].

Personalized visualization of blog statistics

LiU-ITN-TEK-G-13/005-SE

Personlig visualisering av

bloggstatistik

Tina Durmén Blunt

LiU-ITN-TEK-G-13/005-SE

Personlig visualisering av

bloggstatistik

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Tina Durmén Blunt

Handledare Patrik Lundblad

Examinator Jimmy Johansson

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

Contents

Chapter 1

Introduction

1.1

Background

1.2

Objectives

1.3

Basic conditions

1.4

Limitations

1.5

Target group

1.6

Thesis outline

Chapter 2

Theoretical Background

2.1

Perception

2.2

Information visualization

2.3

Usability

Chapter 3

The development process

3.1

Agile

3.2

Persona