Usable Transparency with the Data Track: A Tool for Visualizing Data Disclosures

(1)

http://www.diva-portal.org

Postprint

This is the accepted version of a paper presented at 33rd Annual ACM Conference Extended Abstracts

on Human Factors in Computing Systems, Seoul, CHI 2015 Extended Abstracts, Republic of Korea,

April 18 - 23, 2015.

Citation for the original published paper:

Angulo, J., Fischer-Hübner, S., Pulls, T., Wästlund, E. (2015)

Usable Transparency with the Data Track: A Tool for Visualizing Data Disclosures.

In: Bo Begole, Jinwoo Kim, Kori Inkpen, Woontack Woo (ed.), CHI EA '15 Proceedings of the

33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems

(pp. 1803-18098). ACM Digital Library

http://dx.doi.org/10.1145/2702613.2732701

N.B. When citing this work, cite the original published paper.

Permanent link to this version:

(2)

Usable Transparency with the Data

Track – A tool for visualizing data

disclosures

Julio Angulo

Dept. of Information Systems Karlstad University

julio.angulo@kau.se

Simone Fischer-H¨ubner Dept. of Computer Science Karlstad University

simone.fischer-huebner@kau.se Tobias Pulls

Dept. of Computer Science Karlstad University tobias.pulls@kau.se Erik W¨astlund Dept. of Psychology Karlstad University erik.wastlund@kau.se

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Copyright is held by the author/owner(s).

Copyright is held by the owner/author(s).

CHI’15 Extended Abstracts, April 18–23, 2015, Seoul, Republic of Korea. ACM 978-1-4503-3146-3/15/04.

http://dx.doi.org/10.1145/2702613.2732701

Abstract

We present a prototype of the user interface of a transparency tool that displays an overview of a user’s data disclosures to different online service providers and allows them to access data collected about them stored at the services’ sides. We explore one particular type of visualization method consisting of tracing lines that connect a user’s disclosed personal attributes to the service to which these attributes have been disclosed. We report on the ongoing iterative process of design of such visualization, the challenges encountered and the possibilities for future improvements.

ACM Classification Keywords

K.4.1 [Public Policy Issues]: Privacy; D.2.2 [Design Tools and Techniques]: User Interfaces

Introduction

Today, it is practically impossible for individuals to keep track of all the traces of personal information that they leave in their daily online activities. People rarely have a clear understanding about how data about them is collected, used, shared or accessed. Transparency online plays a key role for endowing individuals with control over their own data and for helping them make well-informed decisions at the moment of disclosing information. However, there is a lack of usable transparency-enhancing

(3)

tools that inform people about the data handling and sharing practices of online services.

We present our ongoing work on the design of

visualizations for a transparency tool, which we call the Data Track. This tool, which started as part of the European PRIME [1] and PrimeLife [2] projects and continues as part of the A4Cloud project [3], provides users with an overview of the data that they have

disclosed to different online services under an agreed-upon policy. It also gives users (even pseudonymous users) the possibility to exercise their rights by letting them access the data about them stored remotely at the services’ side, and, if the service allows it, request the correction or removal of data attributes from that service. Explanations on the security and privacy mechanisms that the Data Track uses to log a user’s disclosures are found in [18], and a description of its earlier UI can be seen in [5,16].

Related work

Publicly available transparency tools include Mozilla’s Lightbeam [15] and the Google Dashboard [7] among others. Other recent services that are exploring

approaches to give users back control of their own data, such asdatacoup.org, are starting to emerge. Detail overviews of other transparency-enhancing tools are presented in [9] and [10]. Research efforts have also identified the challenges, risks and benefits of Personal Data Services (PSD) [4]. Some of the identified benefits included increased transparency for users and the allocation of greater “collective power through their access and use of individual and community level data” [4]. Recognized challenges included the possible confusion that users might have between controlling their data on their devices or accessing it on a remote server, as well as the issue of making sense of large amounts of data

through meaningful visualizations, which is a challenge in many other types big data collection

activities [11,14, 20]. In particular, Kolter et al. [13] proposed displaying personal data disclosures from different comprehensible visualization perspectives, concluding that participants found a graph view useful and intuitive for visualizing online relationships. Similarly, the VOME project [12] studied a so-called ‘interactive social translucence map’, and Zavou et al. [21] explored a ‘chord diagram’ visualization to inform cloud users and service providers about the way data is exchanged in an online transaction. It has also been suggested in [8] that network visualizations have the potential for scalability and dimensionality often encountered in data related to security monitoring and mitigation. In [6], diagrams displaying nodes and links have proved to be effective when analyzing data flows, and using alternating colours, varying line sizes and icons to convey information about events on a person’s life has been proposed in [17].

The trace view visualization

Inspired by the efforts done on the design of transparency tools and by information visualizations guidelines, we envision a visualization technique for the Data Track which we refer to as the trace view (Figure1).

(4)

In the trace view the user is represented by a profile picture in the middle of the screen with the intention of giving users the feeling that this interface is a place that focuses on them (i.e. data about them and services that they have contacted). The UI (Figure1) is then separated into two main panels, the services to which the user has released information appear in the bottom panel and the information attributes that have been released by the user to these services appear in the top panel. By clicking on one (or many) of the services at the bottom, the interface shows a trace from the service to the user, and then from the user to the data items that she has released to that specific service. If the user clicks instead on a data attribute at the top, the trace shows which online services have that particular attribute. The traces are coloured to easily differentiate between them.

Figure 2: A node representing a service provider. By clicking on the cloud icon users can access their data located at the services’ side.

Figure 3: Information about a user stored at the services’ side.

The services in the bottom panel have a button represented by a cloud icon from which users can also access the data about them stored on the services’ sides (Figure2). Clicking this button opens a modal dialog where users can review the data about them that the selected service has stored in their databases (Figure3). Contrasting colours, an explicit headline and adequate spacing are used to differentiate between data that was explicitly submitted by the user from data that has been implicitly collected by the service provider or inferred from analysis of aggregated data attributes. In this view users can also exercise their rights to correct or remove data about them, for instance by requesting to update their address or deleting their location history.

The trace view visualization (Figure1) was designed this way to employ the capabilities of the Data Track in order to address the identified challenges of a PSD mentioned above, and to handle two main use cases: (1) helping

users know “what information have I sent to which online services? ” by displaying the data attributes that they have disclosed to various services, and (2) allowing users to check “what do online services know about me at this point in time? ” by showing users the information about them that is stored at the services’ sides.

Usability evaluation activities

In order to get an idea of the usability of the trace view, as well as users’ mental models and features of the tool they deemed as important, we have so far performed two iterative rounds of usability testing and a workshop session.

Usability testing

For this activity we set up a scenario where participants were asked to buy a book from a fictitious online

bookstore. In order to complete the transaction they were required to submit personal information to the bookstore. After buying the book, participants were shown the trace view interface (Figure1) and a test moderator asked a series of questions while participants interacted with the prototype. Participants were encouraged to think out-loud as they interacted with the prototype. Table 1shows some of participants’ demographics of the two rounds of testing, and Table2 shows the questions used in both rounds, but only results of the second round are shown. The order of the questions asked to participants was semi-randomized in order to minimize the introduction of confounding variables, and a set of successful criteria were laid down for each question.

The first evaluation round (n = 14) was done with an earlier version of the prototype than the one shown in Figures1 –3. Results revealed that participants had a hard time grasping the idea that the Data Track allowed

(5)

them to access and manipulate data about them which was stored on the services’ side, and the UI did not make it obvious how to access these remotely located data (only 4 participants understood this task). Figure4 shows participants’ opinions to statements about the tool, showing that the purpose of the tool was well understood, but participants were sceptical towards the functionality that allows them to manipulate their data at the services’ side. Improvements were made for the next design iteration by, for instance, adding a first run introductory tour explaining the different aspects of the interface and the capabilities of the Data Track, as well as increasing the visual affordance of the button to access the data located at the services’ sides (Figure2) and the dialog displaying these remotely located data (Figure3).

(a) (b) (c) (d) 7 1 100% 80% 60% 40% 20% 0%

Figure 4: Participants answers in a scale from 1 (Strongly disagree) to 7 (Strongly agree) to the statements (a) This program helps me see the Internet services to which I have given my information; (b) This program helps me see which information Internet services have about me; (c) If I regret sending information to an Internet service, this program helps me remove that information, (d) This program helps me get a good overview of who knows what about me.

Table 1: Participants of the first (TP100, n = 14) and second (TP200, n = 17) rounds of usability testing

ID Age Technology_literacy TP101 18 - 23 Somewhat experienced TP104 18 - 23 Somewhat experienced TP103 18 - 23 Experienced TP105 18 - 23 Experienced TP102 18 - 23 Very experienced TP106 24 - 30 Experienced TP107 24 - 30 Experienced TP108 24 - 30 Experienced TP112 24 - 30 Experienced TP109 24 - 30 Very experienced TP110 24 - 30 Very experienced TP111 24 - 30 Very experienced TP114 31 - 40 Somewhat experienced TP113 31 - 40 Experienced TP201 18 - 23 Not experienced at all TP213 18 - 23 Not too experienced TP205 18 - 23 Somewhat experienced TP206 18 - 23 Somewhat experienced TP209 18 - 23 Somewhat experienced TP214 18 - 23 Somewhat experienced TP203 18 - 23 Experienced TP207 18 - 23 Experienced TP216 18 - 23 Experienced TP217 18 - 23 Experienced TP202 18 - 23 Very experienced TP212 18 - 23 Very experienced TP204 24 - 30 Not too experienced TP210 24 - 30 Not too experienced TP215 24 - 30 Somewhat experienced TP208 24 - 30 Experienced TP211 24 - 30 Experienced

A second round of testing (n = 17) was done after addressing the feedback obtained from the first iteration. Results according to specified criteria for successful completion are shown in Table2. This time all

participants correctly identified the services to which they had sent a particular attribute, and all of the attributes that were sent to a particular service. 9 out of 17 people were able to access the information that the bookstore had about them on their servers, indicating that further improvements are needed. Other possible refinements to the interface were also identified, for instance, one participant stated that “if there would be a lot of things on the top, it would be good if there would be a zoom when I select something ”, thus we are working on making selected items more prominent and displaying additional properties about them. It was also noted that participants still had the wrong mental model with regards to the security and privacy mechanisms of the Data Track, since only 7 participants correctly believed that nobody else but them had access to the information stored in the program.

Table 2: Results from the second evaluation round

Success Partial Fail What do you think do the elements on the top

represent?

8 8 1

What do you think do the elements at the bottom represent?

12 3 2

How can you see the information that you have sent to the bookstore?

12 5

How can you see to which Internet services you have given your email address to?

17 Where would you click to see the information that the bookstore has stored on their servers when you purchased the book?

9 4 4

What information about you does the bookstore have on their servers?

15 2

In your opinion, who can access your data that the bookstore has stored on their servers?

8 3 6

Did the bookstore store the location you were at when you bought the book?

16 1

Is the information that the bookstore have about you more or less that what you gave to them? Why do you think this is?

14 3

Where do you think the records shown in the top panel of the Data Track are stored?

13 4

In your opinion, who has access to the records being shown in the top panel of the DataTrack?

7 1 9

What would you do to edit the information that the bookstore has stored on their servers?

16 1

Workshop

We also organized a three hour workshop divided in two sessions (n = 19, Table3). In the first session,

participants were shown an introduction presentation of the trace view prototype (Figure1). They were then assigned into two groups to encourage more discussions and minimize group-think. Moderators led the discussions with a set of predefined questions (Table4). Through their discussions participants recognized advantages of the tool, stating that “it would keep me informed and hold big companies in line”, “makes you aware of what information you put on the Internet, you probably would be more careful ” and “[the UI] is easy to understand ”. Observed downsides were also commented, “if there is one tool collecting all the data, then it is a single point of failure...”.

(6)

Figure 5: Sketch for the latest trace view interface Table 3: Participants of the workshop

session (n = 19)

ID Age Technology_literacy WP04 18 - 23 Not too experienced WP07 18 - 23 Not too experienced WP06 18 - 23 Somewhat experienced WP01 18 - 23 Experienced WP03 18 - 23 Experienced WP05 18 - 23 Experienced WP02 18 - 23 Very experienced WP15 24 - 30 Not too experienced WP09 24 - 30 Somewhat experienced WP14 24 - 30 Somewhat experienced WP16 24 - 30 Somewhat experienced WP08 24 - 30 Experienced WP10 24 - 30 Experienced WP12 24 - 30 Experienced WP13 24 - 30 Experienced WP11 24 - 30 Very experienced WP17 31 - 40 Experienced WP18 41 - 50 Not too experienced WP19 41 - 50 Not too experienced

Table 4: Predefined questions that helped moderators lead the discussions during the workshop

Workshop discussions questions • Do you think you will use this tool? • Under which contexts would this tool be useful for you?

• What did you like the most about this tool and/or its interface?

• What did you like the least about this tool and/or its interface?

• Would you trust this tool if it tells you that “all your data will be encrypted and only you would be able to decrypt it”?

• If this tool could notify you about in-cidents that affect your data, what would you like this tool to notify you about?

In the second session, participants were introduced to ideas for the next design iteration of the trace view, shown in Figure5. This design portrays a more realistic scenario, in which the user discloses many more data attributes to many service providers, thus forcing considerations of scalability and users’ cognitive capabilities. Also, filtering mechanisms have to be in place to allow users to see only those data that they might care about, e.g., the latest or most frequently released personal attributes. The idea in this design is that when users hover the mouse over one item, an in-place container expands to show some details about that item. If the item is clicked the tracing lines indicate the item’s relationship to the other items which are also expanded to show more details about them. Additional information, such as implicitly collected data or amount of disclosures, could be conveyed using varying colours, line thickness, and box sizes and icons.

Participants of the workshop commented on the differences between this UI (Figure5) and the previous one (Figures1), stating that the new UI is “more playful ” but that “too many icons might increase the time to search for a particular icon”. They also recognized additional improvements to the UI, such as categorizing data attributes according to their perceived sensitivity and the possibility to rank service providers by their reputation and/or the level of protection of their users’ data.

Discussions and next steps

In the course of our work, we have identified important themes for the UI design of transparency tools similar to the Data Track: (1) there is a critical need for increased transparency of the data collected implicitly or inferred from analysis. This is a problem that was also recognized in [19], who found that even web-savvy individuals are not fully aware of the aggregation of data done by third-party services; (2) visualizations of a user’s data disclosures, specially in the era of the Internet of things, have to consider perceptual and interactive scalability coupled with big data, as suggested in [14]; (3) in order for users to trust a transparency tool like the Data Track, its security and privacy features have to be made obvious to the user. (4) an intuitive visual difference has to be made between the data that is located on the users’ device securely under their control, and the data that is stored remotely at the services’ sides. We are designing the new versions of the prototype with these considerations in mind. Besides the trace view visualization presented here, we are working on prototypes for visualizing data disclosures in different forms, for instance a chronological representation (Figure6) can be useful for searching for data that was released at a particular date, and treemaps can show the service providers that have collected the most data.

(7)

In order to evaluate the usefulness of different Data Track visualizations in more realistic context of use, appropriate methodologies have to be considered. We see the opportunity to perform usability tests with real users’ disclosures to services that they commonly interact with. This could be done by developing plug-ins that connect to the APIs of various services and fetching some of the users’ disclosures, which can then be visualized through the Data Track’s UI. We believe that using methods for sampling users’ experiences insitu, such as the experience sampling or day reconstruction methods, can provide interesting insights into the motivations and situations in which users would make use of transparency tools to manage their disclosed data.

Figure 6: A lo-fi sketch of a timeline visualization of data disclosures, to be implemented and tested in following iterations

References

[1] Privacy and Identity Management for Europe, 2010.

http://www.prime-project.eu.

[2] Privacy and Identity Management in Europe for Life, 2012. http://primelife.ercim.eu/.

[3] Accountability for cloud and other future Internet services., 2013. http://www.a4cloud.eu.

[4] Acquisti, A., et al. ’My Life, Shared’ - Trust and Privacy in the Age of Ubiquitous Experience Sharing. Dagstuhl Reports 3, 7 (2013), 74–107.

[5] Fischer-H¨ubner, S., Hedbom, H., and W¨astlund, E. Trust and Assurance HCI. PrimeLife - Privacy and Identity Management for Life. Springer, June 2011. [6] Ghoniem, M., Fekete, J., and Castagliola, P. A

comparison of the readability of graphs using node-link and matrix-based representations. In INFOVIS ’04, IEEE (2004), 17–24.

[7] Google. Google dashboard.

https://www.google.com/settings/dashboard. [8] Harrison, L., and Lu, A. The future of security

visualization: Lessons from network visualization. Network, IEEE 26, 6 (2012), 6–11.

[9] Hedbom, H. A survey on transparency tools for enhancing privacy. In The future of identity in the information society, Springer (2009), 67–82. [10] Janic, M., et al. Transparency enhancing tools: An

overview. In STAST ’13, IEEE (2013), 18–25. [11] Kairam, S., et al. Graphprism: compact visualization

of network structure. In AVI ’12, ACM (2012), 498–505.

[12] Kani-Zabihi, E., and Helmhout, M. Increasing service users’ privacy awareness by introducing on-line interactive privacy features. In NordSec ’12, Springer (2012), 131–148.

[13] Kolter, J., et al. Visualizing past personal data disclosures. In ARES ’10, IEEE (2010), 131–139. [14] Liu, Z., Jiang, B., and Heer, J. imMens: Real-time

visual querying of big data. In Computer Graphics Forum, vol. 32 (2013), 421–430.

[15] Mozilla. Lightbeam add-on for Firefox

https://www.mozilla.org/en-US/lightbeam/. [16] Pettersson, J. S., et al. Outlining Data Track:

Privacy-friendly data maintenance for end-users. In ISD ’06, Springer (2006).

[17] Plaisant, C., et al. Lifelines: visualizing personal histories. In CHI ’96, ACM (1996), 221–227. [18] Pulls, T. Privacy-friendly cloud storage for the Data

Track – An educational transparency tool. In NordSec ’12, vol. 7617, Springer (2012), 231–246. [19] Rader, E. Awareness of behavioral tracking and

information privacy concern in facebook and google. In SOUPS ’14 (July 2014).

[20] Wolfe, P. J. Making sense of big data. NAS ’13 110, 45 (2013), 18031–18032.

[21] Zavou, A., et al. Cloudopsy: An autopsy of data flows in the cloud. In HCII ’2013, vol. 8030 (July 2013), 366–375.