Using Privacy Indicators to Nudge Users into Selecting Privacy Friendly Applications

(1)

Using Privacy Indicators to Nudge Users into Selecting Privacy Friendly Applications

Användandet av integritetsindikatorer för att peta användare till att använda integritetsvänliga applikationer

Anton Bromander

Faculty of Health, Science and Technology Computer Science

Master Thesis, 30 ECTS credits

Supervisor: Lothar Fritsch, Nurul Momen Examiner: Simone Fisher-Hübner 2019-06-19

(2)

Using Privacy Indicators to Nudge Users into Selecting Privacy Friendly Applications

Anton Bromander

5 June 2019

(3)

Abstract

In the play store today, users are shown download count, app rating, reviews, screenshots etc. when deciding to download an application, and it’s shown very conveniently. If the users however are interested in viewing privacy information about the application, it is multiple clicks away and there is no default in how to display it. This is where privacy indicators come in handy.

With privacy indicators, data can be analyzed and displayed to the user in a way they understand, even if they don’t understand what the data itself means and what is dangerous. This however comes with the challenge of deciding what is dangerous and what is not.

This report creates and implements an app store with added privacy information displayed to the user in the form of a privacy indicator and some detailed information about each application. To test the effectiveness of the privacy indicator, a small scale study was conducted where it was discovered that users who were not already interested in privacy didn’t pay much attention to it, while those who were took it more into account when deciding to download applications.

(4)

1 Introduction

When applications are collecting data, they need to ask for permission to do so. There are a multitude of different permissions which can be accessed, such as RUN IN BACKGROUND and ACCESS FINE LOCATION. These permissions are in different degrees essential for certain applications, such as giving a GPS application access to your location, where the same permission is not essential in for example a gaming application, who only needs the GPS for localized advertisement. This greatly enhances the difficulty of deciding what permissions are more dangerous than others. To work around this, partial identities are used (see section 1.2). With partial identities it can be determined how much of our total identity is consumed by an application, where the more is consumed, the more dangerous it is.

This goal of this project is to create a modified version of the play store with the addition of displaying the privacy information of each application with a privacy indicator. The research questions are described more in section 1.1.

This report also contains a literature review to understand what already has been done in regards to privacy indicators.

1.1 Research Question

When handed the Project description, the research question is ”how can we communicate privilege-induced privacy risks to the user prior to app installation by using data from app profiling?”. This question has two aspects to it, which will be the two questions focused on in this report, which are:

• RQ1: How can risk get conceptualized from app profiling data?

• RQ2: How can this information in useful ways inform users before installation?

1.2 Partial Identities

With partial identities [1], permissions are grouped into more graspable categories. There are eight different partial identities which are Whereabouts, NetworkID, GoogleID, BiometricsID, PhoneNumber, Address, Area an So- cialGraph. Each permission can contribute to multiple partial identity as a partial identity attribute. These can be direct attributes or indirect attributes. for an attribute to be direct, it needs to be possible to directly gain

(7)

access to that partial identity only using that permission. If it however is indirect, some additional research / use of third party services is required. An example of this is USE FINGERPRINT is directly accessing your biomet- ricID, while CAMERA is accessing biometricID indirectly due to the need of using some face recognition service to identify you, in contrast to your fingerprint which is the identifier itself. Table 1, describes what permissions build up each partial identity.

Table 1: Partial identity table. Direct permissions are written in bold Partial ID Permissions

Whereabouts ACCESS FINE LOCATION, ACCESS WIFI, BLUETOOTH, NFC, ACCESS NETWORK,

READ EXTERNAL STORAGE, READ CALENDAR, CAMERA

NetworkID ACCESS NETWORK, ACCESS WIFI GoogleID GET ACCOUNTS

BiometricsID USE FINGERPRINT, CAMERA, BODY SENSORS, ACCESS AUDIO

PhoneNumber GET ACCOUNTS, ACCESS FINE LOCATION, CALL PHONE, SEND SMS

Address ACCESS FINE LOCATION, CALL PHONE, SEND SMS

Area ACCESS COARSE LOCATION, READ EXTERNAL STORAGE, READ CALENDAR, ACCESS WIFI, ACCESS NETWORK SocialGraph READ CONTACTS, READ CALL LOG,

PROCESS OUTGOING CALLS, RECEIVE MMS

2 Background

This section describes how the literature study is conducted and what were the findings of it.

This literature study is conducted to determine what studies have been done in privacy with regards to android permissions. The goal of the study is to find what has been done and be improved upon when visualizing android permission usage to the user.

The literature study was a snowballing process, where first, a start set of reports will be decided. From there, it was an iterative process to find what reports are referring to the start set, and which the start set refers to. This

(8)

process is repeated for the new reports in the next iteration. To decide what reports will be included in the study, they will need to be included in one of the following categories.

• Includes methods for visualizing android permission usage before the app is downloaded.

• Includes methods for classifying and detecting dangerous apps.

• includes ways to manage privacy settings.

In addition to these criteria, if no new information is given in the report, i.e it only uses methods already discussed, it will not be included. Reports can be discarded based on title, abstract, conclusion or lastly, the entire report.

2.1 Start set

Google scholar was used to find a good start set of reports which are similar to what this paper wants to accomplish. Due to the vast amount of reports in this subject, the searches made resulted in about 10 000 results and therefore, when a good amount of reports were found, the start set was complete. For this reason, the discard rules don’t apply here, but starts at the backward and forward snowballing.

Table 2: Start set findings

Paper Condition 1 Condition 2 Condition 3

[2] Yes No No

[3] Yes Yes No

[4] No Yes No

[5] Yes Yes No

[6] No Yes No

[7] Yes Yes Yes

[8] No Yes Yes

As seen in table 2, seven papers were chosen for the start set. The findings will be listed below in terms of how the conditions were fulfilled.

2.1.1 Visualization methods

The first, and also simplest visualization discovered was to use lists to display privacy information [2]. This information was shown to the user one click

(9)

away from the download button. Due to this, multiple users never even saw the visualization. Another issue is that the list is not taking into account the frequency of the permission count, only if it is used or not. This is not an intuitive way of showing dangerous permissions either, due to having nothing to compare with or a message saying if it’s safe or unsafe. A more intuitive way is to use a privacy meter as a slide bar going from safe to dangerous [3].

This has the opposite downside to the previous method, which is it’s only showing a broad visualization with very little detail. The user will see a slide bar saying if it’s safe or dangerous but no further information on why it is safe or unsafe. The same is true in [5], where eyes or doors coloured from green to red is suggested to be used to visualize the permission usage. This however has one major upside. Due to being just an image being displayed, it can be shown before the user even clicks on the application. This is also true in [7], where a warning triangle is used to signal dangerous applications.

2.1.2 Classification of dangerous apps

This is by far the hardest part of this topic due to multiple factors. Firstly, what is considered appropriate behaviour is subjective to each user. Some users are very open on what permissions apps can gain access to, while some are very restrictive. Secondly, users approve of different behaviour from different apps, where they are more restrictive to certain types of apps.

A primitive approach used in [3] is to count the amount of dangerous permissions used by the app. The more used, the more dangerous it is. This however gives no information on what kind of danger the app is (weather it gains access to location, accounts, etc.). One popular method was to include the user in the classification of dangerous apps [8] [7]. This is done by making the user choose which attributes they are concerned with the app collecting.

If any of them are collected, the visualization will be shown. [5] is using a sensitivity score, derived by the quantity of permissions used and calculates the risk of giving up personal information.

A more out of the box solution is to use natural language processes (NLP) [4] to analyze the privacy policy. The NLP is isolating the part about what data is collected and calculates a result based on what is analyzed. This is an interesting approach but if not configured carefully, it can produce wrong results by privacy policies being worded differently. The last method found was to use cluster algorithms to group similar applications together on a 2D graph. From there, apps behaving differently will be further away from the center and can there be classified as dangerous.

(10)

2.1.3 Privacy management methods

The findings in this category were quite similar [7] [8]. They both use methods where the user can choose what permissions they feel are the most in- trusive and from there, their privacy settings are managed to prioritize application which doesn’t collect those permissions.

2.2 Forward Snowballing

Table 3: Forward snowballing findings Paper Condition 1 Condition 2 Condition 3

[9] No No Yes

[10] Yes No Yes

[11] Yes No No

In [11], The number of permissions accessed is displayed to the user at the download page. In addition to this, in the same graph comparisons are made to similar application. In their example, they had a weather app, where the comparison is made to the mean number of permissions of other weather apps. [10] describes three different levels of privacy management. For each level, the amount of work needed from the user is increased. The three levels are reporting, fine grained tuning and fencing, where reporting falls under visualization methods while the other two falls under privacy management methods. These are discussed in detailed. Reporting is where the user is shown privacy information and from there is reacting to it. However, this don’t give the user the ability to disable the permissions. This was the case before Android 6.0, where a list of permissions were shown before download but with the only option being agree or disagree. Reporting is however a good starting block to inform the user of privacy information.

An app is designed in [9], where privacy information can be displayed to the user about what applications are using what permissions. It is shown for example which apps have accessed your location in the last 14 days, and how

(11)

many times. From there, there is an option to change your privacy settings.

It was also found effective to send nudges to the users, prompting them to review or manage their privacy settings.

As mentioned in the previous section, fine grained tuning and fencing are the second and third level described in [10]. Fine grained tuning is where the user can decide for themselves what permissions an app should have access to. This can be implemented in multiple ways, for example by letting the user disable each permission manually or disabling them by groups. Fine grained tuning is done in android 6.0 and forward, where the user is prompted at the first app usage to give access to certain groups of permissions requested.

Fencing, which is the third and final level is useful when app is requesting a permission and will not work without it. One implementation of fencing is to fake the permission data, such as giving a false location for example. This is however only viable when the faked data is not integral to the application, but it doesn’t work without it. For example, an app might crash when trying to display local advertisements without the location data. If this is fenced, the app will still work as intended. However a GPS app would not work properly with faked data.

2.3 Backward snowballing

Table 4: Backward snowballing findings Paper Condition 1 Condition 2 Condition 3

[12] Yes No Yes

[13] No No Yes

[14] Yes No No

The first finding was similar to the findings in the start set where privacy information is displayed by text. However, in this example a label similar to a nutrition label was described. [14]. In nutrition labels, they show how much protein, fat, sugar etc. is in the product. To tailor this towards privacy policies, a matrix like visualization was created. This visualization is a matrix where in each row is the data used and the columns are how the data is used.

each box can then be filled with 4 different markers symbolizing the data is being used this way, the ability to opt out of the usage, the data will only

(12)

be used if you opt in and lastly that certain parts of that data can be opted in or out from. This is a big step up from regular privacy policies which are very time consuming to read. This however is not understandable at just a glance, but is still a lot faster than regular privacy policies.

One of the goals of [13] was to show the possibility of giving users a few amount of privacy profiles, despite having different privacy preferences. The profiles discovered was the (Privacy) conservatives (11.90 %), the unconcerned (22.34 %), the fence-sitters (47.81 %) and The Advanced Users (17.95

%). The first two profiles are quite self explanatory. They are either concerned ( the (Privacy) conservatives) or unconcerned (the unconcerned) in regards to giving away their privacy information. The fence sitters are usu- ally comfortable with giving away information for app functionalities, but are more concerned when it comes to giving away the data to third parties.

The advanced users are the users which seemed to have a more nuanced understanding of the permissions and if they were necessary or not. These profiles could theoretically be assigned to each user, by the user, to make them comfortable with what they are sharing

3 Project setup

3.1 Project Specification

Since the project is very opened ended, there was not a lot of specifications, but there is some. Firstly, a literature review should be conducted to determine what has been done for privacy indicators and what doesn’t work. For the practical part of the project, a mock app store should be created. This should simulate the Play Store in looks, but with some added privacy indicators should be added. What these indicators are was not specified but will be discussed in the design section. However, the privacy information will be calculated from the data of the app described in [15], [16]. This application logs every permission used by every application on the device. The log file created is then used as input for the privacy calculations.

(13)

3.2 Choice of tools

3.2.1 Programming languages

In a previous project [17], where the same log data was used, Python (version 3.6.5) was used as the main programming language. That project was some- what similar to this one, which meant that some of the back end logic could be quite easily transferred to this project if python was used as the main language. For this to work, a library to parse python code into an android application would need to be found and implemented. When this was found, python was decided to be used. If this would not have been the case, C# in combination with android studios.

3.2.2 Environment

PyCharm [18] was selected as the programming environment. This IDE has all features that would is needed for this project and is easy to use. In addition, it has a great interface for using pip to install additional libraries

3.2.3 Libraries

• Kivy 1.10.1 [19] is an open source library for creating cross platform applications. Due to this, the application created can be run on both android and IOS. In addition to being a regular library, Kivy comes with its own language, which can be used in combination with the python code. The kivy languages uses a XML like syntax, where you use objects as tags in order to create the object and specifies the attributes. The Kivy code can be written in python code as well, which is good when objects need to be dynamically created, but the structure is more clear when using the kivy language.

• Numpy 1.16.2 [20] was used for the numerical calculations which are needed. This includes, but is not limited to getting random values and getting unique elements in a list.

• Pandas 0.24.2 [21] have the main data structure used in this project called data frame. It is a more powerful version of a table which is more efficient and also has more features when it comes to searching and inserting values.

(14)

• Matplotlib 3.0.3 [22] was used to display some of the privacy visualizations which will be discussed in the design chapter. It is an easy and popular way of creating basic plots.

• Kivy garden [23] was used to create matplotlib graphs inside of the Kivy application.

• AppMonsta [24] is a framework for scraping metadata from google play store. This metadata includes the app name, download counts, app rating, description etc. and is accessed by sending a http request with the coresponding parameters to what you want to scrape from the play store. AppMonsta has a limit of 100 requests per day, which aplies some restrictions to the design.

• Requests 2.21.0 [25] was used for making HTTP requests for accessing the metadata

• Re 2.2.1 [26] is a regular expression library which was used when strings needed to be parsed

• Buildozer 0.39 [27] is a tool for creating application packages from the Kivy and python code. This is done through a buildozer.spec file which has all the specifications needed to create the application.

• PyInstaller 3.4 [28] is used to create a .exe file to be ran on PC.

• Adb Logcat [29] is used to debug the code when it’s on the android device. This is done by plugging in the device to the PC and running adb logcat to get the entire log from the phone, which includes the error messages.

4 Design

4.1 Google Play Store

To make the design as intuitive as possible, the goal was to mimic the play store [30] as much as possible but with the addition of some privacy indicator.

The play store has three pages, show in figure 2, 1 and 3. The home screen, where recommended and trending applications are shown. The second page, which will be called the search page, is accessed when something is searched or a category is clicked. Here, the relevant applications for the searches are shown as a list. The third page is what will be called the app page. This

(15)

page is entered when you click an application from either the home screen or the search screen. On the app page, information is shown about the chosen application. This information shown contains screenshots from the app, description, reviews etc. This is also where the user is given the option to download the application.

4.2 Privacy Indicators

The privacy indicator must be implemented very carefully. If it’s too week, it will not be payed attention to and to aggressive, it will portray the information wrongfully, making an application look worse than it is.

The idea for the privacy indicator is to have a simple indicator, indicating how dangerous the application is. This will be the first indicator shown, but it can be expanded showing more privacy information in forms of various graphs. The first privacy indicator, which is the most basic in terms of visualization, has four different values, 1 - 4, where 1 is most safe and 4 is very dangerous. These scores are visualized by having a separate color from green to red, representing how dangerous the application is. Each value has an unique text to it as well which is safe, moderate, critical and very critical.

4.3 Privacy Score

To have a privacy indicator, some kind of score needs to be designed. This score should correspond to the indicator values described in Privacy Indica- tors. Attributes which are included in the score are permission count, partial IDs accessed and how much of the partial identity is accessed. Each of these will have equal value when it comes to the total score. This means there are three parts to the total privacy score which each contributes with an equal amount. Each of these parts are scored from 1 - 4, where the total score is the mean of all the partial scores. The attributes to the partial identity scores is shown to the user as two separate bar charts. There are more information that could be interesting to the user which are not shown to them, such as a timeline of when during the day the permissions are accessed. However, since this don’t contribute to the score, it was deemed to cause more confusion than help. How the partial scores are calculated is described below.

(16)

Figure 1: Screenshot of the home page in the play store

(17)

Figure 2: Screenshot of the search page in the play store

(18)

Figure 3: Screenshot of the app page in the play store

(19)

4.3.1 Permission Count

The Score from the permission count is calculated in proportion to the other applications permission count in the metadata. If the application has the top 25 % of permission counts, the score will be 4, if it’s in the top 50 %, it will have a score of 3, top 75 % has a score of 2 and the rest has a score of 1. This is a very simple measurement but it does the job it’s supposed to do. The pros of using a method like this, where the score is calculated with regards to the rest of the applications is that there is no need for the developer to decide on a limit for what is safe and what is unsafe, and instead let the app market dictate it.

4.3.2 Partial Identity count

This section has two parts to it, partial identities accessed directly and partial identity accessed indirectly. The score becomes twice as high if the partial identity was accessed directly as if it was accessed indirectly. The same solution made for the permission count was considered here as well. However, due to only having eight partial identities, multiple of these will have the same value. In addition, most of the applications accessing between three and five number of partial identities. By having the values so compact, some of the scores might get skipped completely, which would not be ideal. Due to this a more static approach was decided, where four intervals of 0 - 1, 2 - 3, 4 - 5 and 6 - 8. Each interval symbolizes how many partial identities have been accessed.

4.3.3 Partial Identity score

This score describes how much of the accessed partial identities has been accessed. For each partial identity accessed, a score will be calculated where the direct partial identity attributes accessed are worth 2 points, while the indirect attributes are worth one point. The score is then divided with the maximum score for that specific partial identity, i.e if all the attributes con- tributing was accessed. The score is then added with the score from the other partial identities and then divided with the amount of partial identities accessed. As in the permission count score, the severity of the partial identity score is decided depending on the rest of the scores. If it’s in the bottom 25% its safe, 25% - 50% is moderate, 50% - 75% is critical and the top 25%

is very critical.

(20)

Figure 4: An example of the log file used

4.4 Advanced statistics

At first, multiple ideas for advanced visualizations were considered. Some of these ideas were permission count, partial identity score, partial identity attributes, permission usage over a timeline etc. However, some of these visualizations are not considered in the overall score. Including those visualizations would be counter intuitive, since the user would think they contributed to the overall privacy indicator, which is the main focus. It was instead decided that only factors to the privacy score would be visualized as a way of showing the user why the score is what it is. Partial identity count and partial identity score was bundled into one graph, while partial identity count was made into a separate graph. Multiple types of graphs were considered such as bar charts, pie charts and bubble charts. It was decided that for both of these visualizations a bar chart was easier to understand than the other options. How easy the graph is to understand was an important factor, since they should be viewed by anyone, and anyone should understand them at a glance.

4.5 Data handling

As mentioned previously, input is a large amount of app profiling data that has to get processed, currently at 72 MB, see figure 4.

Ideally this should not be stored on the device since it will take up a lot of precious space. The first idea was to parse this file to where each entry is all the data needed for one specific application, such as download count, app name, rating etc. however, when graphs were to be implemented, it became clear that this would not be a viable solution, due to the vast amount of columns needed to be added. In addition, it would be hard to extend. What was decided on was instead to keep the parsed file, but only having it contain the metadata. This resulted in a file containing the following attributes for each application:

(21)

• Permission count, the total amount of permissions called.

• App Id, the unique id used to reference the application.

• App Name, the app name showed to the user.

• Category, what category of applications the app belongs to.

• Rating, the user rating given to the app in the play store.

• Downloads, how many times the app has been downloaded.

• Icon, the app icon used in the app store

• Screenshots, a list of URLs to the screenshots used in the play store to promote the application.

• Privacy rating, the calculated privacy rating for the application. This calculation is quite time consuming and is therefore added to the metadata instead of being calculated when needed.

• Description, the description of the application from the play store.

In addition to the metadata file, the log data was also decided to be stored on the device. Despite taking up quite a lot of storage, it was deemed as a good enough solution for this project. A better solution would have been to access the data from a server, taking away the storage problem completely.

However, this takes time and is not the focus for the project, so it was decided as an improvement to be added if there was time left for it.

4.6 User study

To test the effectiveness of the privacy indicator, a small scale user study was created. It was decided to use two groups. The first group was the control group, which will not be shown any privacy indicators, while the second group is the study group, where all of the indicators was shown. The goal of the study is to find the differences of what apps are downloaded, and if the privacy indicator has anything to do with it. One important thing to remember is to take the user bias into account. Users will always tend to choose applications which they have had good experiences with prior.

The test is split into three parts, where the first two parts are identical for both groups, first, a short survey with the following questions:

• Q1: How important is the download count when you are deciding to download an application? (Scale 1 - 5)

(22)

Figure 5: The Run time view of the app store

(23)

• Q2: How important is the app rating when you are deciding to download an application? (Scale 1 - 5)

• Q3:How important is the privacy policy when you are deciding to download an application? (Scale 1 - 5)

• Q4: Do you change what permissions apps are allowed to use? This has the following options:

Yes, at first start when prompted.

Yes, after using the application.

No.

For the second part, the user is greeted by the application and are asked to download an application to be used for communicating with people. What way they want to communicate is up to them. If they want a video call application, text message application, contact manager or even a web browser, it is all fine.

The last part of the test is another short survey. The first two questions are for both groups, while the last to is only for the study group, since they are questions about the privacy indicators. These are the questions asked:

• Q5: What application did you choose?

• Q6: what were the reasons for you to choose this application

• Q7: Did you view the detailed graphs?

• Q8: How much did you take the privacy indicator into account? (Scale 1 - 5)

The last question is very similar to the question about privacy policy in the first survey, with the hypothesis of having a bigger impact then the privacy policy. In the study, the user are not told about what the privacy indicators are. This is due to that no one would tell them if they were downloading an app from the play store. However, if they ask about them they will be given the information about what it is.

5 Implementation

5.1 Code structure

One of Kivys great feature is its separation of logic and visualization. The Kivy language has an easy to use structure when creating objects. The plan

(24)

was to only use this as the front end, and let the python code be purely back end. However, it was discovered that it is way easier to dynamically place objects with python. This is necessary for example when creating the applications on the app page. Where Kivy shines however is the static objects, such as the layouts needed, the search bar, permanent buttons etc.

Due to this, the Kivy code should be used for static objects and python for dynamic object. However, due to the limited knowledge of Kivy from the start, these rules were broken at certain points due to not finding a different solution at the time. To use multiple screens, a screenManager object needs to be use. This is a built in object in Kivy with the responsibility of switching between different screens. This object however doesn’t have built in support for sending data between screens, which is necessary for this project. To solve this, a new class called stateManager was created. This class inherits from screenManager and contains an object with all of the data which needs to be used by multiple screens. This is the app id, the raw data, the meta data and if the privacy indicators should be shown.

5.1.1 System architecture

The architecture of the system is split into three layers. These are the presentation layer, the logic layer and the data layer. The presentation layer is responsible for displaying everything to the user. This includes the apps, privacy ratings graphs descriptions etc. The logic layer has the task of filtering the data and presenting it in a way that the presentation layer can interpret in a simple way. The data layer contains only raw data which will be read by the logic layer. The task of each module is described below and in figure 6

• KAUdroidappstore, sets up the static objects in the app store. This includes buttons, layouts, backgrounds etc.

• Main is the file which is ran when the application starts. It has the task of creating the dynamic objects which includes app buttons, graphs, privacy score, ratings etc.

• MatplotlibTest creates the graphs which are shown by main.

• Partial id contains the back end of the partial identities and includes definitions of what is included in what partial identity and some helper functions to perform calculations on it.

• Metadata.py is used to access the metadata from appMonsta. It

(25)

Figure 6: System architecture

(26)

only contains one functions, which returns an object with the specified values.

• FilterData is the main module in the logic layer. It contains all the filtering from the metadata and sampledata files. In addition, it also writes to the metadata file when it needs to be updated.

• Metadata.csv contains the metadata calculated and gathered from appmonsta.

• Sampledata has all of the raw data. It is very large and contains one entry each time an application has requested a permission.

5.2 Patching to APK File

To patch the code to work on an android device, an APK file needs to be created. This is done by using buildozer. With buildozer, a buildozer.spec file is created. This file is modified to include all of the dependencies in the code.

The spec file is also used to specify the app name, permissions, orientation of the phone etc. Once buildozer has finished, the apk file was added to the phone and was installed. This crashed at startup, and was debugged with adb logcat. This discovered some dependencies of libraries used was not installed and were then added in the spec file and a new APK was created.

The same process was repeated and the app still crashed. It was through logcat found out that two of the libraries used, matplotlib and pandas, was not compatible with python for android, which is used in the background of buildozer. This can be solved by adding recipes of those libraries to python for android. However, this takes time to learn and implement, which as this point was not an option. It was instead decided to create an exe file, and perform the tests on PC. The downside of this is that the interface is still designed for use on a phone, which means it will be less intuitive.

5.3 Patching to exe File

Creating an executable file is very similar to creating an APK file. It is done by using PyInstaller, which first will create a .spec file. All of the project dependencies are added to this file. once done, PyInstaller was executed on the spec file and a folder with an executable was created. This folder also includes all of the libraries which are included in the project. The only folder which was not included was kivy.garden.matplotlib, which is used to show

(27)

Figure 7: Start page

matplotlib graphs in the kivy application. This had to be added manually instead. Once added, the executable worked smoothly.

5.4 User interface

When the application is started, the user is prompted to select if they want to show the privacy settings or hide them, see figure 7. This is implemented with a flag that is checked when creating the privacy indicators. Once one of the options has been chosen, the rest of the application is starting to build.

When finished building, the user is greeted with the home page. The home page consists of a box layout with three boxes from top to bottom. The first box is the search bar, where the user can search for an application. The search algorithm is described more in depth in section 5.5. The second box contains a box layout within a scrollview, to be able to scroll through all of the categories. The last box has a scrollview, containing all of the applications currently searched for (Everything by default). Each of the applications

(28)

Figure 8: Home page

(29)

consists of a button covering the entire box, an image of the application, the app rating, the name of the app and the calculated privacy score. The privacy score is also a button which if pressed, will show more detailed information of why the score is what it is. If an application is pressed, the user is prompted to the app page. This screen also contains a box layout with three boxes within a scrollview. The first box contains the app name, the app image, the privacy rating, a home button, an install button and a button to show some screenshots from the application. Ideally, the screenshots would be shown in a box by themselves. However, I was not able to create a scrollable object within a scrollable object. Why this is the case is still unknown, but the solution decided upon was to use another button to show them on a separate screen instead. The home button is another compromise due to not being able to patch the code to an apk file. If used on a phone, the phones back button could have been used but instead, a button to go to the home page needed to be created.

The second box is very simple and contains only the app rating and the download count. In the real play store, it also contains the PEGI rating.

This however did not feel necessary for this project to include, due to all of the participants being over the age of 18.

The last box contains the description of the application. This description is gathered from appMonsta as a string. Due to this, the indentation and line brakes of the application is not perfect. In the description string, only the line brakes was shown and could therefore be parsed quite easily. tabs however was not indicated and can therefore not be implemented to make it more readable.

If the ”Show Screenshots” button is pressed, the user will be greeted by multiple screenshots from the application chosen. The screenshots are scrollable sideways. This was designed as this to mimic the play store. However, when ran on a desktop, it provides some awkward interactions when scrolling, since the scrolling is done by click and drag with the mouse. There is also a back button to get back to the app page. Since the screenshots were designed to be viewed on a mobile device and not a desktop, the resolution scaling is a bit off.

The last button which can be pressed is the privacy indicator itself. This page contains the two graphs described in section 4.4. Here as well there is a back button, which always goes back to the app page, even if the indicator was pressed from the home page.

(30)

Figure 9: App page

(31)

Figure 10: Screenshots page

(32)

Figure 11: Download count graph

(33)

Figure 12: Partial Id score graph

(34)

5.5 Filter Applications

There is two ways of filtering for applications. The first is to use the categories at the top of the main page, which are given from the play store by AppMonsta. The second way is to search for an application using the search bar. Both methods are in the back end using the same function, which is to search for an application. If a category is selected, that category is searched for. Once the search function is called, it looks for that phrase anywhere in the metadata file. If it is contained, that application will be shown to the user.

6 User study

The user study had participants, 12 male and 3 female, which performed the test described in section 4.6 and is split 7 in the control group and 8 in the study group. Most of the test subjects have some kind of background in computer science, but that is not a requirement. The recruitment process was a mixture of reaching out to friends, asking if they wanted to help, and speaking in the end of my supervisors lecture asking if anyone was interested in taking the study. The test took place in a controlled environment where the test subjects were asked four questions (Q1 - Q4) before performing the test. Once the test was performed, the test subjects were to answer 4 or 2 additional questions. The study group was asked two questions about the privacy indicator, which is why they will answer 4 questions and the control group only 2. There was no compensation for participating in the study.

6.1 Both groups

• Q1: The most popular answer was 4 with 40% of the participants.

20% answered 2 and 3 respectively, 13 % answered 1 and only 6.7 % answered 5.

• Q2: 4 was the Most popular answer here as well with 53% of the votes.

5 had 20 %, 3 had 13 % while 1 and 2 had 6.7 % respectively.

• Q3: 1 and 2 had 33.3 % each, while 3 and 4 had 13 % and 5 had 6.7

%.

• Q4: 20 % answered no, 40 % answered ”Yes, after using the application” and ”Yes, when prompted on app start”.

(35)

Figure 13: Control group question 1

The following two sections, 6.2 and 6.3 will discuss the individual responses in detail.

6.2 Control group

The most downloaded application was Messenger, with 43 % and a privacy rating of ”Critical”. The following are the reasons the users gave for downloading it:

• Safe bet

• Possibility for fast communication with only needing to know the name of the person I want to contact. It is possible to communicate through text, speech and video. It is also possible to send files.

• The application I use the most, works for everything except mail.

The second most popular application was Discord, which was chosen 19 % of the times with everyone had a variation of known application/Good rating as the reasoning for downloading it. Skype and Telegram had both 14 % of the chose rate. Telegram had the reasoning of good rating, many downloads and making a secure connections between users. For Skype, the reasoning was that Microsoft is trusted by the user.

(36)

(37)

(38)

6.3 Study group

For this group, 50 % chose Messenger as their preferred application. This were the reasons:

• The application I mostly use for talking to others.

• Other people uses it

• Familiar with the application since before and what I use for communication

• Usefulness

The rest of the results was a 4 way tie of Discord (Moderate), Messenger lite (Moderate), WhatsApp (Very Critical) and Signal (Critical). These are the reasons for each app:

• Discord, Known branding, experience with the app and the ”security”/permission addition looked decent.

• Signal, has good encryption.

• WhatsApp, I’ve used it before.

• Messenger lite, the app I use the most for communication online.

Only one user viewed the detailed graphs (12.5 %) and when asked if they took the privacy indicator into account (Q8), the following result was given.

50 % answered 1, 25 % answered 2 and 12.5 % answered 3 and 4 respectively.

In the form, there are two very similar questions, Q3 and Q8. If the scores of both are summed up, only using the study groups results, Q3 gets a score of 15, while Q8 gets a score of 15 as well. The individual responses between the two questions are also very similar, with a maximum of one point difference.

6.4 Discussion

The goal of this report was to create a system which nudges the users into selecting privacy friendly application. On first glance of the result, this doesn’t seem to be the case, with Q3 and Q8 having very similar results. However, this does not take into account the ease of use of the privacy indicator, creating a simple to understand indicator which can be shown without the need of reading the privacy policy. The fact that it was very similar results on the two questions show that it mainly affects the people who already is interested

(39)

Figure 18: Study group question 1

(40)

(41)

(42)

in the privacy policy can be viewed as a positive, since if the user is not interested in privacy, it should not deter them of downloading the application, which the result suggests.

The result also proves that the user has a very strong bias towards applications which they are using since almost everybody chose an application they were already using. Even if the participants often chose an application they were familiar with, they took the indicator into account if the privacy policy was important to them. One idea to minimize this bias was to remove some of the most popular applications from the app store, such as Messenger, discord and WhatsApp. This would force users to look more into some of the less known applications and chose one of those. However, if a similar privacy indicator were to be released on the play store, the bias would still exist, making it unrealistic to remove it from the experiment.

More interesting results are the fact that only one user viewed the detailed graphed, shown in figure 11 and 12. This probably has multiple factors, for example, the only way to view them are by pressing the privacy indicator, which doesn’t really look like a button, but more like a label. Another factor was that this was not told to the user to be a feature, due to seeing how they would use the app store in a real life scenario. The results would probably have been different if the user was told about this feature. The reason for this, and also why the user was not told about what the privacy indicator meant, was to get the most realistic result without affecting the user in any way by overtalking about privacy prior to taking the test.

The user study could have been done more rigorously, however, with the issues described in section 5.2 plus a few issues with getting the study conducted, time was running out for this project. Due to this, the study was

(43)

kept quite minimalistic and only asked the users to download one application. If more time would be remaining, they would probably have been asked two download two application, ranked 1 and 2, from two different purposes (Communication and one other). This would have given more and better results but would also have taken a lot longer to analyze.

An important note to the study is that it’s a small scale study, meaning that all of the results must be taken lightly and nothing definitive can be said.

The results can however be used to form a first opinion about the privacy indicator.

7 Conclusion

This project consisted of two parts. First, conduct a literature review of privacy indicator to obtain background information, and secondly, answer the research questions for the project. This project had two research questions, RQ1 and RQ2.

• RQ1: How can risk get conceptualized from app profiling data?

• RQ2: How can this information in useful ways inform users before installation?

RQ1 gets answered by using a combination of partial identities accessed and the permission usage by each application. By doing this a score is calculated from 1 - 4, with the higher score, the more dangerous it is. The scoring system is not perfect, but it still provides adequate information on the permission usage of each application.

RQ2 gets answered by creating a mock app store with the added privacy information calculated in RQ1. This information is displayed on each application on both the home page and the app page with the text Safe, Moderate, Critical or Very Critical, depending on the score. It is also colored on a scale from green to red, giving an intuitive image of how dangerous the app is according to the definitions answered in RQ1.

A user study was conducted to test the effectiveness of the privacy indicator. In the study, it was discovered that users who already cares about the privacy policy will take the privacy indicator more into account than people who doesn’t care about it. The indicator is however much easier to form an opinion around at first glance, due to its lightweight nature. The effectiveness of the advanced graphs can’t be determined due to only one participant viewed them.

(44)

There were a few issues during the implementation phase which affected the design of the application. These issues include not being able to create a scrollable object within another scrollable object, resulting in the screenshots being added to a different page, and the usage of libraries which couldn’t be patched to an APK file, resulting in a few more buttons on the UI.

7.1 Future Work

There is always more work to be done. In addition to what has been done, possible improvements are the following:

• Make the application web based. In its current state, the entire app store, including all the metadata (except the screenshots) needs to be downloaded to the device and takes up a lot of space. If it were to be web based, there would only be need for a lightweight GUI application which fetches all the data from a server.

• Make the application work on mobile devices. This was one of the problems during this project. This would probably be able to be com- bined with making it web based, since a phone would be able to access the server as well.

References

[1] L. Fritsch and N. Momen, “Derived partial identities generated from app permissions”, in Open Identity Summit (OID) 2017, Gesellschaft f¨ur Informatik, 2017.

[2] P. G. Kelley, L. F. Cranor, and N. Sadeh, “Privacy as part of the app decision-making process”, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, 2013, pp. 3393–3402.

[3] J. Kang, H. Kim, Y. G. Cheong, and J. H. Huh, “Visualizing privacy risks of mobile applications through a privacy meter”, in Information Security Practice and Experience, Springer, 2015, pp. 548–558.

[4] M. Alohaly and H. Takabi, “Better privacy indicators: A new approach to quantification of privacy policies.”, in WPI@ SOUPS, 2016.

[5] L. L. Iacono, P. L. Gorski, J. Grosse, and N. Gruschka, “Signalling over-privileged mobile applications using passive security indicators”, Journal of Information Security and Applications, vol. 34, pp. 27–33, 2017.

(45)

[6] N. Gruschka, L. L. Iacono, and J. Tolsdorf, “Classification of android app permissions: Tell me what app you are and i tell you what you are allowed to do”, in ECCWS 2018 17th European Conference on Cyber Warfare and Security, Academic Conferences and publishing limited, 2018, p. 181.

[7] D. Ferreira, V. Kostakos, A. R. Beresford, J. Lindqvist, and A. K. Dey,

“Securacy: An empirical investigation of android applications’ network usage, privacy and security”, in Proceedings of the 8th ACM Conference on Security & Privacy in Wireless and Mobile Networks, ACM, 2015, p. 11.

[8] B. Liu, M. S. Andersen, F. Schaub, H. Almuhimedi, S. A. Zhang, N.

Sadeh, Y. Agarwal, and A. Acquisti, “Follow my recommendations: A personalized privacy assistant for mobile app permissions”, in Sympo- sium on Usable Privacy and Security, 2016.

[9] H. Almuhimedi, F. Schaub, N. Sadeh, I. Adjerid, A. Acquisti, J. Gluck, L. F. Cranor, and Y. Agarwal, “Your location has been shared 5,398 times!: A field study on mobile app privacy nudging”, in Proceedings of the 33rd annual ACM conference on human factors in computing systems, ACM, 2015, pp. 787–796.

[10] J. Pennekamp, M. Henze, and K. Wehrle, “A survey on the evolution of privacy enforcement on smartphones and the road ahead”, Pervasive and Mobile Computing, vol. 42, pp. 58–76, 2017.

[11] L. Kraus, I. Wechsung, and S. M¨oller, “Using statistical information to communicate android permission risks to users”, in Socio-Technical Aspects in Security and Trust (STAST), 2014 Workshop on, IEEE, 2014, pp. 48–55.

[12] C. S. Gates, J. Chen, N. Li, and R. W. Proctor, “Effective risk communication for android apps”, IEEE Transactions on dependable and secure computing, vol. 11, no. 3, pp. 252–265, 2014.

[13] J. Lin, B. Liu, N. Sadeh, and J. I. Hong, “Modeling users’ mobile app privacy preferences: Restoring usability in a sea of permission settings”, in 10th Symposium On Usable Privacy and Security (${$SOUPS$}$

2014), 2014, pp. 199–212.

[14] P. G. Kelley, J. Bresee, L. F. Cranor, and R. W. Reeder, “A nutrition label for privacy”, in Proceedings of the 5th Symposium on Usable Privacy and Security, ACM, 2009, p. 4.

[15] N. Momen, T. Pulls, L. Fritsch, and S. Lindskog, “How much privilege does an app need? investigating resource usage of android apps”, in Conference on Privacy, Trust and Security, vol. 2017, 2017.

[16] A. Carlsson, C. Pedersen, F. Persson, and G. S¨oderlund, “KAUDroid : A tool that will spy on applications and how they spy on their users”,

(46)

Karlstads universitet, Karlstad, 978-91-7063-928-9 (ISBN), 2018, p. 56.

[Online]. Available: http://urn.kb.se/resolve?urn=urn:nbn:se:

kau:diva-66090 (visited on 02/02/2018).

[17] A. Blomqvist, A. Bromander, and S. Sundberg, “KAUdroid - project report:”, p. 60,

[18] (). PyCharm: The python IDE for professional developers by JetBrains, JetBrains, [Online]. Available: https://www.jetbrains.com/pycharm/

(visited on 06/04/2019).

[19] (). Kivy: Cross-platform python framework for NUI, [Online]. Avail- able: http://kivy.org/ (visited on 06/04/2019).

[20] (). NumPy — NumPy, [Online]. Available: https://www.numpy.org/

(visited on 06/04/2019).

[21] (). Python data analysis library — pandas: Python data analysis library, [Online]. Available: https://pandas.pydata.org/ (visited on 06/04/2019).

[22] (). Matplotlib: Python plotting — matplotlib 3.1.0 documentation, [Online]. Available: https://matplotlib.org/ (visited on 06/04/2019).

[23] (). Garden — kivy 1.11.0 documentation, [Online]. Available: https:

/ / kivy . org / doc / stable / api - kivy . garden . html (visited on 06/04/2019).

[24] AppMonsta. (). AppMonsta — global app market data API, [Online].

Available: https://appmonsta.com (visited on 06/04/2019).

[25] (). Requests: HTTP for humans^TM — requests 2.22.0 documentation, [Online]. Available: https://2.python- requests.org/en/master/

(visited on 06/04/2019).

[26] (). Re — regular expression operations — python 3.7.3 documentation, [Online]. Available: https://docs.python.org/3/library/re.html (visited on 06/04/2019).

[27] Generic python packager for android and iOS. contribute to kivy/buildozer development by creating an account on GitHub, original-date: 2012-07- 16T16:34:32Z, May 30, 2019. [Online]. Available: https : / / github . com/kivy/buildozer (visited on 06/04/2019).

[28] (). PyInstaller quickstart — PyInstaller bundles python applications, [Online]. Available: https : / / www . pyinstaller . org/ (visited on 06/04/2019).

[29] (). Logcat command-line tool — android developers, [Online]. Avail- able: https : / / developer . android . com / studio / command - line / logcat (visited on 06/04/2019).

[30] (). Google play, [Online]. Available: https : / / play . google . com / store?hl=en (visited on 06/04/2019).

(47)

A Appendix

This appendix contains the questions for the user study.

• Q1: How important is the download count when you are deciding to download an application? (Scale 1 - 5)

• Q2: How important is the app rating when you are deciding to download an application? (Scale 1 - 5)

• Q3: How important is the privacy policy when you are deciding to download an application? (Scale 1 - 5)

• Q4: Do you change what permissions apps are allowed to use?

• Q5: What application did you choose?

• Q6: What were the reasons for you to choose this application?

• Q7: Did you view the detailed graphs? (Only study group)

• Q8: Did you take the privacy indicators into account when downloading? (Scale 1 - 5, only study group)

Using Privacy Indicators to Nudge Users into Selecting Privacy Friendly Applications

Using Privacy Indicators to Nudge Users into Selecting Privacy Friendly Applications

Användandet av integritetsindikatorer för att peta användare till att använda integritetsvänliga applikationer

Anton Bromander

Using Privacy Indicators to Nudge Users into Selecting Privacy Friendly Applications

Anton Bromander

5 June 2019

Contents

1 Introduction

1.1 Research Question

1.2 Partial Identities

2 Background

2.1 Start set

2.2 Forward Snowballing

2.3 Backward snowballing

3 Project setup

3.1 Project Specification

3.2 Choice of tools

4 Design

4.1 Google Play Store

4.2 Privacy Indicators

4.3 Privacy Score

4.4 Advanced statistics

4.5 Data handling

4.6 User study

5 Implementation

5.1 Code structure

5.2 Patching to APK File

5.3 Patching to exe File

5.4 User interface

5.5 Filter Applications

6 User study

6.1 Both groups

6.2 Control group

6.3 Study group

6.4 Discussion

7 Conclusion

7.1 Future Work

References

A Appendix