• No results found

Towards Inter-temporal Privacy Metrics

N/A
N/A
Protected

Academic year: 2021

Share "Towards Inter-temporal Privacy Metrics"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Faculty of Economic Sciences, Communication and IT Computer Science

Karlstad University Studies

2011:25

Stefan Berthold

Towards Inter-temporal

Privacy Metrics

(2)

Stefan Berthold

Towards Inter-temporal

Privacy Metrics

Karlstad University Studies

2011:25

(3)

Stefan Berthold. Towards Inter-temporal Privacy Metrics Licentiate thesis

Karlstad University Studies 2011:25 ISSN 1403-8099

ISBN 978-91-7063-357-7 © The Author

Distribution: Karlstad University

Faculty of Economic Sciences, Communication and IT Computer Science

S-651 88 Karlstad +46 54 700 10 00 www.kau.se

(4)

iii

Towards Inter-temporal Privacy Metrics

STEFANBERTHOLD

Department of Computer Science, Karlstad University

Abstract

Informational privacy of individuals has significantly gained importance after information technology has become widely deployed. Data, once digital-ised, can be copied and distributed at negligible costs. This has dramatic consequences for individuals that leave traces in the form of personal data whenever they interact with information technology. The right of individu-als for informational privacy, in particular to control the flow and use of their personal data, is easily undermined by those controlling the information technology.

The objective of this thesis is to study the measurement of informational privacy with a particular focus on scenarios where an individual discloses personal data to a second party, the data controller, which uses this data for re-identifying the individual within a set of others, the population. Several instances of this scenario are discussed in the appended papers, most notably one which adds a time dimension to the scenario for modelling the effects of the time passed between data disclosure and usage. This extended scenario leads to a new framework for inter-temporal privacy metrics.

The common dilemma of all privacy metrics is their dependence on the information available to the data controller. The same information may or may not be available to the individual and, as a consequence, the individual may be misguided in his decisions due to limited access to the data controller’s information when using privacy metrics. The goal of this thesis is thus not only the specification of new privacy metrics, but also the contribution of ideas for mitigating this dilemma. However, a solution will rather be a combination of technological, economical and legal means than a purely technical solution. Keywords: privacy, unlinkability, metrics, uncertainty, valuation process, domain-specific language, anonymous communication.

(5)
(6)

v

Acknowledgements

First of all, I would like to thank my supervisor Prof. Simone Fischer-Hübner and my co-supervisor Prof. Stefan Lindskog for their steady support, interest-ing discussions, and advise durinterest-ing my work at Karlstad University.

I am also grateful to Prof. Viiveke Fåk for accepting the role of the opponent in my licentiate thesis defence.

I would like to thank my colleagues in the PriSec group at at Karlstad University, particularly Hans Hedbom, Reine Lundin, Tobias Pulls, and Ge Zhang; and my former colleagues at Technische Universität Dresden, particu-larly Rainer Böhme, Sebastian Clauß, Thomas Gloe, Matthias Kirchner, Stefan Köpsell, Stefanie Pötzsch, and Antje Winkler for challenging me in inspiring discussions, pointing me to interesting questions, and collaborating with me.

Thanks also to all my colleagues at the Department of Computer Science for providing such a friendly working environment.

A huge thank also to my mum, Erika Berthold, and my sister, Susann Berthold, for always being available; and to all my friends in Karlstad, in Dresden, and in quite a few other places around the world for keeping in touch with me, even over long distances.

Research leading to this work has received funding from the European Community’s Sixth Framework Programme (FP6/2002–2006) under grant agreement no. 507512 and the Seventh Framework Programme (FP7/2007– 2013) under grant agreement no. 216483 as well as the Research Council of Norway through thePETwebIIproject and the German Academic Exchange Service (DAAD).

(7)
(8)

vii

List of Appended Papers

I. Stefan Berthold and Sebastian Clauß. Linkability Estimation Between Subjects and Message Contents Using Formal Concepts. InProceedings of the 2007 ACM Workshop on Digital identity management (DIM), ACM Digital Library New York, 2007.

II. Stefan Berthold, Rainer Böhme, and Stefan Köpsell. Data Retention and Anonymity Services—Introducing a New Class of Realistic Ad-versary Models. In The Future of Identity in the Information Society, IFIP Advances in Information and Communication Technology, 92–106, Springer, 2009.

III. Stefan Berthold and Rainer Böhme. Valuating Privacy with Option Pricing Theory. In Tyler Moore, David Pym, and Christos Ioannidis (eds.), Economics of Information Security and Privacy, book chapter, Springer, 2010.

IV. Stefan Berthold. Towards a Formal Language for Privacy Options. InPost-proceedings of the PrimeLife/IFIP Summer School, Springer. To appear 2011.

Comments on my Participation

Paper I I was the main author of all parts of the paper and received a number of useful comments from Sebastian Clauß.

Paper II This paper was a joint work of all three authors. I was the driving force behind the first ideas of this paper and later took the responsibility for adding the sections on the attacks and bringing the theoretical ideas together with the study in a coherent paper. Stefan Köpsell carried out the empirical study and drafted the results of it in German. He was also the main author of the sections about anonymity services in a nutshell and the legal background. Rainer Böhme contributed the section about probabilistic intersection attacks and supervised the study.

Paper III This paper was a joint effort with Rainer Böhme. I authored the main part of the paper, i. e., the related work on anonymity and unlinkability as well as Sections 3, 4, 5, and 6. Rainer authored the introduction, the related work on financial methods in information security, and the conclusions; and contributed his knowledge about economics to the main ideas of this paper. Paper IV This paper was solely authored by myself. During the course of writing the paper, I received many constructive and motivating comments from colleagues and friends which have been acknowledged in the paper.

(9)

viii

Selection of Other Peer-reviewed Publications

• Stefan Berthold. Possibilistic Disclosure Attacks in Polynomial Time. Inpre-proceedings of theFIDIS/IFIPInternet Security & Privacy Summer School, Masaryk University, Czech Republic, 2008.

• Ge Zhang and Stefan Berthold. Hidden VoIP Calling Records from Network Intermediaries. InProceedings of the Principles, System, and Ap-plications of IP Telecommunications (IPTCOMM2010), Munich, Germany, 2010.

• Stefan Berthold and Sebastian Clauß. Privacy Measurement. In Jan Camenisch, Ronald Leenes, and Dieter Sommer (eds.),Digital Privacy, PRIME– Privacy and Identity Management for Europe,LNCS, Vol. 6545, Springer Verlag, 2011.

Contributions to Project Deliverables

• Stefan Berthold and Sebastian Clauß. Metrics. In Marek Kumpošt, Vashek Matyáš, and Stefan Berthold (eds.),Privacy modelling and iden-tity,FIDISDeliverable 13.6, 2007.

• Martin Meints, Stefan Köpsell, and Stefan Berthold. Data Retention. In Stefan Berthold and Sandra Steinbrecher (eds.),Estimating Quality of Identities.,FIDISDeliverable 13.9, 2008.

• Stefan Berthold and Sebastian Clauß. Influence of misinformation on anonymity metrics. In David-Olivier Jaquet-Chiffelle, Bernhard Anrig, and Emmanuel Benoist (eds.), Applicability of privacy models, FIDIS Deliverable 13.8, 2008.

• Stefan Berthold and Martin Meints. Related Work on Secure and Privacy-preserving Logging. In Günther Müller and Sven Wohlgemuth (eds.),From Regulating Access Control on Personal Data to Transparency by Secure Logging,FIDISDeliverable 14.6, 2008.

• Rainer Böhme, Stefan Berthold, Stefanie Pötzsch, and Hans Hedbom. TechnologicalTETs. In Mireille Hildebrandt (ed.),Biometric Behavioural Profiling and Transparency Enhancing Tools,FIDISDeliverable 7.12, 2008. • Stefan Köpsell and Stefan Berthold. Public Key Infrastructure. In

Kai Rannenberg, Denis Royer, and André Deuker (eds.),The Future of Identity in the Information Society: Challenges and Opportunities,FIDIS Summit Book, 2009.

• Stefan Köpsell and Stefan Berthold. Electronic Signatures. In Kai Rannenberg, Denis Royer, and André Deuker (eds.),The Future of Iden-tity in the Information Society: Challenges and Opportunities,FIDIS Sum-mit Book, 2009.

(10)

ix

Contents

List of Appended Papers vii

I

NTRODUCTORY

S

UMMARY

1

1 Introduction 3

2 Background and Related Work 3

2.1 Anonymity . . . 4

2.2 Unlinkability . . . 6

2.3 Information Asymmetry . . . 7

2.4 Privacy Policy Languages . . . 8

3 Research Questions 8 4 Research Methods 9 5 Contributions 10 6 Summary of Appended Papers 11 7 Conclusions and Future Work 12

P

APER

I

Linkability Estimation Between Subjects and Message

Con-tents Using Formal Concepts

18

1 Introduction 21 2 Formalization of Messages and Contents 24 2.1 Message Lattice . . . 25

2.2 Data Lattice . . . 26

2.3 Improve Message Contents by Derivable Data . . . 27

2.4 Intermediate Results . . . 29

3 Deducing Subject Knowledge 29 3.1 Subject–Pseudonym Lattice . . . 29

3.2 Assigning Subjects to Messages . . . 31

3.3 Contents towards Subject Knowledge . . . 33

3.4 Intermediate Results . . . 35

4 Composing Contents and Knowledge in One Lattice 35

(11)

x

P

APER

II

Data Retention and Anonymity Services—Introducing a

New Class of Realistic Adversary Models

41

1 Introduction 43

2 Anonymity services in a nutshell 45

3 Legal background 45

4 Cross-section attack 47

5 Intersection attack 48

6 Setup of our study on intersection attacks 49

6.1 Preparation of theAN.ONclient software . . . 49 6.2 Formal notation . . . 49

7 Results of our study on intersection attacks 50

8 Probabilistic intersection attacks 54

9 Conclusions 56

P

APER

III

Valuating Privacy with Option Pricing Theory

60

1 Introduction 63

2 Related Work 65

2.1 Measurement of Anonymity and Unlinkability . . . 65 2.2 Financial Methods in Information Security . . . 66

3 From Financial to Privacy Options 67

4 Sources of Uncertainty 69

4.1 Micro Model: Timed Linkability Process . . . 69 4.2 Macro Model: Population Development . . . 72

5 Valuation of Privacy Options 75

6 Discussion of Results 78

(12)

xi

P

APER

IV

Towards a Formal Language for Privacy Options

88

1 Introduction 91

2 Privacy Option Language 93

3 POL Contract Management 95

4 Converting POL Contracts to PPL Sticky Policies 97

5 Canonical Form Contracts in POL 99

6 Extensions 102

(13)
(14)

Introductory Summary

“O glücklich, wer noch hoffen kann,

Aus diesem Meer des Irrtums aufzutauchen!

Was man nicht weiß, das eben brauchte man,

Und was man weiß, kann man nicht brauchen.”

Faust

— Faust. Der Tragödie erster Teil. (1808)

Johann Wolfgang von Goethe

(15)
(16)

Towards Inter-temporal Privacy Metrics 3

1 Introduction

Determining when people first thought about their privacy is probably im-possible, but it is reasonable to assume that people ever since sought to as-sess which actions lead to better privacy and which to worse. The ways of measuring privacy depend on the aspect of privacy, e. g., physical privacy or informational privacy. In this thesis, we focus on the latter, informational privacy.

Informational privacy of individuals has significantly gained importance after information technology has become widely deployed. Data, once digit-alised, can be copied and distributed at negligible costs. This has dramatic consequences for individuals that leave traces in the form of personal data whenever they interact with information technology. The right of individuals for informational privacy, in particular to control the flow and use of their personal data, is easily undermined by those controlling the information tech-nology. Privacy metrics can help individuals to evaluate the consequences of disclosing personal data and thus help them to make informed decisions.

The objective of this thesis is to study the measurement of informational privacy with a particular focus on scenarios where the individual discloses personal data to a second party, the data controller, which uses this data for re-identifying the individual within a set of others, the population. Several instances of this scenario are discussed in the appended papers, most notably Paper III which adds a time dimension to the scenario for modelling the effects of the time passed between data disclosure and usage. This extended scenario leads to a new framework for inter-temporal privacy metrics.

The common dilemma of all privacy metrics is their dependence on the information available to the data controller. The same information may or may not be available to the individual and, as a consequence, the individual may be misguided in his decisions due to limited access to the data controller’s information when using privacy metrics. The goal of this thesis is thus not only the specification of new privacy metrics, but also the contribution of ideas for mitigating this dilemma. However, a solution will rather be a combination of technological, economical and legal means than a purely technical solution, as discussed in Paper IV.

The remainder of the introductory summary is structured as follows. Sec-tion 2 introduces the basic terminology and concepts we use and refer to in this thesis and discusses related work. Section 3 defines the research questions addressed in this thesis, followed by a brief overview of research methods in Section 4. Section 5 summarises the contributions made in the appended papers to the research field of privacy metrics. Section 6 summarises the appended papers. Section 7 summarises the conclusions and outlines future work.

2 Background and Related Work

This section summarises the basic terms and concepts used in this thesis and refers to the related work. Since the central concept of this thesis is

(17)

informa-4 Introductory Summary

tional privacy, we start the background section with defining it. Though there is a large body of work using the term, there is no consolidated definition for privacy. One of the earliest definitions in modern scientific literature was given by Westin who has defined privacy as

“the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others.”[45]

The German Federal Constitutional Court has defined privacy as the right of informational self-determination:

“[The] constitutional right of informational self-determination guarantees the permission of an individual to determine on prin-ciple himself on the disclosure and use of his personal informa-tion.”[20]

These notions have been widely adopted and we will thus not redefine inform-ational privacy, but rather focus on the facets of it which are discussed in the appended papers.

2.1 Anonymity

Pfitzmann and Hansen[32] were maintaining a terminology overview for research in the informational privacy domain. The continuous updates of this terminology have been widely adopted in other scientific work since its first publication in 2000[31]. Pfitzmann and Hansen define anonymity as follows:

“Anonymity of a subject means that the subject is not identifiable within a set of subjects, the anonymity set.”[32]

The subjects in this definition are the actors in a network that either send or receive messages. If, for instance, the sending subject of a message cannot be determined among a set of other sending subjects, then the sending subject of this message is anonymous and the set of sending subjects in which the sender of the message is hiding is theanonymity set.

The size of the anonymity set is a measurement of anonymity. For sizes greater than 1, the sending subject is considered anonymous. An anonymity set size of exactly 1 means that the sending subject isidentified.

Though not explicitly mentioned in[32], the anonymity set notion was most likely motivated by the way anonymity was established in Chaum’sDC network[9]. Chaum motivated his work with an artificial problem which he referred to as the dining cryptographers problem: one message should be sent and unblinded publicly without anyone learning who the sender was. His solution to this problem is theDCnetwork, illustrated in Figure 1: each user, except one, encrypts an empty message with a one-time pad[44]. The remaining user encrypts his non-empty message with his one-time pad key and submits it to the network as well. The keys are chosen such that their bitwise sum is zero. Due to the one-time pad encryption and the specific way

(18)

Towards Inter-temporal Privacy Metrics 5 Alice (key)101 ⊕ 000(msg) Bob (key)110 ⊕ 101(msg) Charlie (key)011 ⊕ 000(msg) + public (cipher)101 (cipher)011 (cipher)011 (msg)101

Figure 1: Illustration of aDCnetwork. Bob sends the message 101 (encrypted with the key 110) which can be read by all three users, but neither Alice nor Charlie learn that Bob was the sender.

of decrypting, i. e., summing up all messages, it is not possible to discern the non-empty message and the empty ones, even after decrypting. Consequently, all users are equally likely the sender of the non-empty message, thus, all users form an anonymity set.

Though originally working in cryptography, Chaum pioneered the research on privacy-enhancing technology (PET). Besides theDCnetwork, he also laid the foundations for anotherPET, the mix[8]. Mixes are network proxies with some interesting properties, see[15, 17] for recent surveys of different types of mixes and Figure 2 for an illustration of a mix. The term ‘mix’ refers to the mode of operation of these proxies whose main purpose it is to disturb (or ‘mix’) the relation between incoming and outgoing messages. A user who is sending a message to the mix can thus be certain (to the extend security is provided by the methods used, e. g., public-key cryptography[37]) that his message can not only be related to him but equally well to all other users who sent a message to the mix at the same time. The user is hiding in the anonymity set of all senders.

Mix users are not required to send messages continuously over time, i. e., over several mix rounds. This allows a number of attacks and though these attacks do not help revealing a sender–message relation, they are efficient in uncovering sender–recipient relations. The simplest attack is intersecting two or more anonymity sets which have been observed when a specific recipient got a message. This attack and variations of it are known as long-term intersection attacks[6]. If the accuracy of the result is only relevant up to a certain error margin, there is an even more efficient class of attacks, the statistical disclosure attacks[14,26] which also take the probabilities into account of mix users in

mix sender A sender B sender C recipient 1 recipient 2

Figure 2: Illustration of a mix. The mix operation makes it impossible to trace an incoming message through the mix and relate it to an outgoing message.

(19)

6 Introductory Summary

each anonymity set being the sender of a message to a specific recipient. In the mix round illustrated in Figure 2, we see that all senders submitted one message, but recipient 1 received two message while recipients 2 got only one. The probability of sender A being the sender of a message to recipient 1 is thus

2/3while the probability of A being the sender of a message to recipient 2 is

only1/3.

This probability distribution is not represented in the anonymity set no-tion. A more general notion is the degree of anonymity[18, 39] which uses entropy in Shannon’s[40] sense. The entropy of the probability distribution of individuals within the anonymity set can be interpreted as the average informa-tion an attacker would learn when a relainforma-tion between sender and recipient is revealed to him. When the attacker has yet not discovered this relation, the entropy can be understood as a measure of the effort a successful attack on the mix would require in average. A third interpretation of the entropy is the (logarithm of the) average anonymity set size provided by the mix round.

We use a notion similar to the anonymity set in Paper I, simulate the effect of long-term intersection attacks and measure the sizes of the anonymity sets in Paper II, and generalise entropy-based metrics in Paper III.

2.2 Unlinkability

Pfitzmann and Hansen define unlinkability as follows:

“Unlinkability of two or more items of interest (IOIs, e. g., subjects, messages, actions, . . . ) from an attacker’s perspective means that within the system (comprising these and possibly other items), the attacker cannot sufficiently distinguish whether theseIOIs are related or not.”[32]

Unlinkability generalises anonymity, i. e., unlinkability describes the non-existence of a relation between two items of interest (possibly including sub-jects) and anonymity describes the non-existence of a relation between a subject and an item of interest (possibly another subject). Thus, every attack on an-onymity is at the same time an attack on unlinkability, but anan-onymity metrics must be seen as special cases of unlinkability metrics. For instance, Stein-brecher and Köpsell[41] use equivalence class membership relations to model the relations between items of interest. Probability distributions on these membership relations are used to model the degree of unlinkability.

The generalisation from subjects and messages to items of interest allows to abstract away from pure network-based scenarios, e. g., to measuring the linkability of personal data to individuals. Early work motivated by this scenario provided confidentiality metrics for statistical databases[16,20,38,42] and has been adapted[10] to the unlinkability notion later. This new scenario was also the motivation for research on a branch of identity management systems that supports individuals in managing their personal data disclosure with regard to anonymity and unlinkability. The first solutions for such privacy-enhancing identity management[7, 11] lead to a refinement of the

(20)

Towards Inter-temporal Privacy Metrics 7

requirements[21] and culminated in several projects [22, 25, 30] funded by the European Community and other institutions[29].

Even though the objective of Paper I is proposing anonymity metrics, it uses several (‘formal scaling’) steps where no subject is involved. These intermediate steps can be understood as unlinkability metrics. In Paper III, we use anonymity metrics as an example which can easily be generalised to unlinkability metrics.

2.3 Information Asymmetry

The purpose ofPETs, such asDCnetworks and mixes, is to minimise the flow of personal data1 on a technical level. The term ‘data minimisation’

coined in the privacy community and used in legislations such as the German ‘Bundesdatenschutzgesetz’ refers to the strategy of individuals choosing the smallest set of personal data to disclose to a data controller for achieving a specific goal, e. g., selling or purchasing goods of any kind or subscribing and using a service.

The effects of data minimisation can be better explained in the context of contract theory[3]: in a market with asymmetric distribution of information among the parties, the ones with more information have an incentive to use their information against the parties with less information[1]. A data controller sharing personal data of an individual with a third party without informing the individual creates an information asymmetry in which the controller possesses more information than the individual. The individual (benefits or) suffers from the consequences of the distribution of its data, i. e., the data controller creates (positive or) negative externalities[19] for the individual[24, 43].PETs help to reduce information asymmetries, and thus negative externalities, by reducing the flow of data from the individual to the data controller.

This view suggests that data minimisation is not the only strategy for redu-cing information asymmetries, but rather has an equivalent sibling, increasing transparency, i. e., increasing the flow of information from the data controller to the individual. Bellotti and Sellen[5] discuss both strategies and the relation between them.

WhilePETs are effective at the time of the data disclosure, transparency-enhancing tools (TETs) may be effective either as predictive tools (ex-ante) before the data disclosure or as retrospective tools (ex-post) after the fact[23]. Predictive TETs can be used by individuals as decision support tools that anticipate the future consequences of a data disclosure while retrospectiveTETs inform the individuals about the de facto consequences of their data disclosures. Both types can be mixed so that consequences of past data disclosures can be taken into account for future data disclosures.

1Broader definitions of PETs can be found in the literature that also capture technology such as

access control, identity management, and even TETs (described later). In this thesis, we restrict the meaning of PETs to data minimisation.

(21)

8 Introductory Summary

The metrics in Paper I and III can be conceived as predictiveTETs as well as the outcome of the experiment in Paper II and contract language in Paper IV.

2.4 Privacy Policy Languages

Privacy policies describe which personal data is collected by a data controller and what is going to happen with the data. The policies can be seen as predictive TETs which inform the individual about the intended usage of its data by the data controller. With this information, the individual can choose among data controllers by choosing the privacy policy that matches the individual’s preferences best. If preferences and policies are machine-processible, the choice of a data controller could even be automated, e. g., as part of an identity management system.

Several specifications of formal languages for privacy policies and pref-erences have been proposed. Among the first privacy policy languages and the first that became a standard was the platform for privacy preferences (P3P)[13,36]. It defines a number of assertions, partly formalised and partly free text assertions, for the content category of the collected data, purpose of the data collection, whether the data is used in a personally identifiable way, and which parties receive the data[36]. In conjunction withP3P, ‘aP3P prefer-ence exchange language’ (APPEL)[12] has been proposed as the corresponding privacy preference language. The current working draft of the language allows to specify theP3Passertions acceptable for the individual by stating simple expressions.

Another privacy policy language is the enterprise privacy authorization language (EPAL)[4] which contributed a new concept to privacy policies, the obligation. According to the specification, obligations are additional steps to be taken when a certain action is performed[4].

EPALhas been discontinued and remained a submission for standardisation. A competing language for privacy policies, the extensible access control markup language (XACML),[27,28] has become a standard earlier and has been found more flexible thanEPAL[2].

As opposed toP3P, bothEPALandXACMLare access control languages. This line has been continued in the PrimeLife policy language (PPL)[35] which extendsXACMLand specifies a number of subdialects, including an equivalent for privacy preferences. The access control rules in all of these three languages are optimised for efficient decisions on access rights, but are considerably less suited for the anticipation of access requests, i. e., the anticipation of what data is requested and when. Their use as a predictiveTETis therefore limited.

In Paper IV, we propose a language for privacy contracts which is better suited as a predictiveTETand provides a more general model for obligations thanEPALandXACML.

3 Research Questions

(22)

Towards Inter-temporal Privacy Metrics 9

1. How can possible information asymmetries between data subjects and data controllers be reduced?

Possible solutions to this question are discussed in all four papers, mainly in the form of transparency-enhancing technologies or tools (TETs). Paper II, III, and IV provide predictiveTETs and Paper I outlines a solution which can work as predictive as well as a retrospectiveTET. In addition, the canonical form of privacy option contracts in Paper IV can be seen as tool for data minimisation, thus as aPET.

2. How can we formally model the effect of uncertainty about the validity of personal data?

In Paper I and II, we make the assumption that privacy once lost cannot be regained. In reality, however, the longer the duration of storing data without validation or update, the greater is the likelihood of storing now invalid data. For the data controller, this means that the data becomes increasingly useless and the data subject thus implicitly regains its privacy. A mathematical model of this process gives interesting insights into how the expected usefulness of the data for the data controller and the expected infringement on the privacy of the data subject can evolve over time. We discuss a first approach of such a mathematical model in Paper III.

4 Research Methods

The main research method used in this thesis is themathematical method[33]. Besides that, we used in Paper II thescientific method[34]. Both methods have a lot in common, but differ in their details. We briefly discuss them in this section.

Both methods consist of four steps. The first step in the mathematical method isunderstanding the problem. It is similar to the first step, character-isation of the problem, in the scientific method. In both methods, this step is to identify theunknown, the data and conditions, and to formulate them in suitable notation.

The second step in the mathematical method is devising aplan (also referred to asanalysis), and in the scientific method, it is the formulation of the hypothesis. In both cases, the second step can be summarised as finding a connection between the data and the unknown. While the hypothesis in the scientific method usually is a mathematical model, the plan in the mathematical method can also be an analogy to a similar (or related) problem or a proof idea.

The third step in both methods can be different. In the mathematical method, it iscarrying out the plan (synthesis), i. e., specifying the analogy or carrying out the proof. It can also be adeduction as the third step in the scientific method is, i. e., instantiating the hypothesis to predict the outcome of laboratory experiments or observations in the nature.

The forth step in the mathematical method islooking back. Its main purpose is to review the results from the previous steps, i. e., validating or verifying

(23)

10 Introductory Summary

the results, possibly simplifying the plan, and look for other applications of the same solution, i. e., generalising the solution. In the scientific method, the fourth step is mainly concerned withvalidating the predictions from the previous step in experiments.

In Paper I, III, and IV, we applied the mathematical method. The approach used in Paper I includes the reformulation of the problem, i. e., analysing the re-lation of disclosed data in terms of a similar problem, Formal Concept Analysis, for which a solution is known. In Paper III, the problem is to understand and model the uncertainty in privacy measurement and the solution is an analogy to the uncertainty model in option pricing. Once the analogy is exposed, the model could be adapted with small adjustments. In Paper IV, the same analogy is used to adapt the syntax and semantics of a language for specifying financial option contracts in order to specify a language for terms and conditions of data disclosures. It is even proved that contracts of that language converge to a canonical form.

In Paper II, we use the scientific method to carry out an experiment with data gathered from an operational anonymity service. The experiment is a simulation of two attacks, the intersection and the cross-section attack which are likely to be mounted on similar the data. Simulation has been chosen over collecting more detailed data that would have allowed real measurement in order to raise the acceptance among possible participants of the experiment.

5 Contributions

This section summarises the contributions made in this thesis.

1. Proposal of an ontological approach to analysing the linkability of individu-als’ data. Paper I discusses how individuals (possibly with support of the data controller) can keep track of what personal data has been sent to data controllers. An ontological approach, Formal Concept Analysis, is used for modelling explicit and implicit data disclosure, i. e., also model-ling data that can be derived from disclosed data. Keeping track of data disclosures is, besides minimising data disclosure, a way of increasing transparency and therefore reducing information asymmetries (Research Question 1).

2. Identification of a new realistic attacker model and analysis of its the ef-fects on an operative anonymity service. In Paper II, a study about an attacker’s knowledge gain to be expected from the information provided by a law-abiding data controller in the context of Germany’s first im-plementation2of the European Data Retention Directive (2006/24/EC)

is performed. The results suggest that a random attacker would greatly benefit from the retained data while the service would still provide enough anonymity to be of no considerable use in court when only 2The law has been enacted by the German government in December 2007 and nullified by the

(24)

Towards Inter-temporal Privacy Metrics 11

relying on single requests to the anonymity service. The attacker model and the study contribute to Research Question 1 by informing users of anonymity services about the data that will be retained by the ser-vice provider and about the utility of that data with respect to different attacker models.

3. Identification of the analogy between data disclosure and option contracts known from economics. Paper III discusses the analogy of factors relevant for valuating the monetary value of financial options and the informa-tional value of disclosed data. The analogy to financial options is called Privacy Options. This finding indirectly contributes to Research Ques-tion 2 by introducing a thought pattern that is used for making the next contribution.

4. Proposal of a model of the uncertainty in anticipating the future con-sequences of personal data disclosure today, i. e., for valuating informational privacy. Paper III uses the analogy between financial options and data disclosure and adapt the mathematical methods for accounting for the uncertainty in option pricing for modelling the uncertainty in valuating informational privacy. This finding directly contributes to Research Question 2. The anticipation of the future consequences can also be understood as enhancing the transparency and therefore reducing the information asymmetry (Research Question 1).

5. Specification of a privacy contract language which generalises the notion of time and events in privacy policies, defines permissions in terms of obliga-tions instead of artificially separating these terms, and provides a canonical form for avoiding unintentionally creating covert channels by exploiting degrees of freedom in the language syntax. The contract language specified in Paper IV establishes the connection between privacy policy languages and Privacy Options and outlines a framework in which the uncertainty model introduced in the previous contribution can be applied. The con-tracts specify what is going to happen with the disclosed data, thus form a predictiveTETand contribute therefore directly to Research Question 1. Initiating new risks to privacy is avoided, e. g., by specifying a canonical form. This can be seen as one aspect of minimising the disclosed data and thus reducing information asymmetries as aPET.

6 Summary of Appended Papers

Paper I – Linkability Estimation Between Subjects and

Mes-sage Contents Using Formal Concepts

This paper presents an ontological approach to analysing linkability of data disclosure. The analysis uses a directed graph in which nodes represent data items, messages, pseudonyms of the data subject, and the data controllers.

(25)

12 Introductory Summary

Edges represent the relations between the node entities. These graphs are created from incidence matrices by means of Formal Concept Analysis.

Paper II – Data Retention and Anonymity Services

This paper presents empirical results on the anonymity that can be provided within the terms of the Germany’s first implementation of the European Data Retention Directive (2006/24/EC). This implementation required that, e. g., internet service providers retain traffic data and thus specified a model for data collection. It also specified an attacker model by regulating the condi-tions under which the data could be accessed and by whom. The study is performed using an operative anonymity service and real user data. The degree of anonymity has been examined under the attacker model specified in the implementation of the European Data Retention Directive and under less restrictive assumptions about the attacker.

Paper III – Valuating Privacy with Option Pricing Theory

This paper presents a probabilistic framework for measuring informational privacy. The actual measurement of informational privacy can be done by any reasonable metric that deals with attribute frequencies. The framework adds the time dimension, i. e., a model for uncertainty about the individual’s attributes induced by the development of the individual and the population of all individuals over time. This uncertainty model builds on the analogy to uncertainty models in option pricing known from economics. The framework is thus called Privacy Option.

Paper IV – Towards a Formal Language for Privacy Options

This paper presents a formally specified language,POL, for Privacy Options. It particularly focuses on the aspects of time in Privacy Options. It is similar to privacy policy languages such asP3Pand PrimeLife’sPPLas both types of languages specify the rights and obligatons of data controllers, but differs in its design as a contract model rather than an access control model. InPOL, rights are defined as obligation contracts with the freedom to nullify them, i. e., rights and obligations are expressed with the same vocabulary, which allows defining obligations in a more flexible way than known from privacy policy languages. Besides, we present a canonical form ofPOLcontracts which can be used to normalise the syntax of contracts and thus avoid unintentionally creating covert channels which could create new privacy risks otherwise.

7 Conclusions and Future Work

In this thesis, we have shown that privacy metrics are an important research area with challenging research questions. We have also presented several novel

(26)

Towards Inter-temporal Privacy Metrics 13

approaches for privacy metrics. The most important contribution was to un-derpin the uncertainty about the validity of collected data induced by time with a mathematical model, the Privacy Option. Various parameters of this model can be specified in terms of contracts in the Privacy Option Language and privacy measurement can be defined as semantics of these contracts. Other con-tract semantics are conceivable, e. g., the data management semantics defined in Paper IV. We are convinced that future contract languages for privacy-enhancing identity management have to provide and cope (at least) with these two semantics equally well.

In Paper II, we have measured privacy with empirical data and with regard to a realistic (law-abiding) attacker model and an operational anonymity service. The conclusion is that, despite the fears in public, privacy can be preserved quite well against the law-abiding attacker. However, attacks on the same data beyond the restrictions of the law-abiding attacker may have more severe consequences in terms of privacy decrease.

Paper I extends the anonymity set notion: data is either related to an individual or not. One of the most interesting challenges is to generalise the methods such that imperfect knowledge can be modelled in all relations, i. e., imperfect knowledge about deducible data items as well as imperfect knowledge about the linkability of data to an individual. Paper III can be seen as a follow-up in this regard, since modelling imperfect knowledge is the main subject of this paper. In order to present the metrics in Paper III in a compact way, we limited our model of personal data to the most basic case, i. e., one attribute with two attribute values. In future work, we will generalise this data model. Paper IV outlines the plan for this work. Another opportunity for future work is the validation of the privacy metrics presented in Paper III with empirical data.

Future work with regard to the contract language specified in Paper IV may focus on new languages that wrap the contracts and put them in a context, e. g., specifying the contract parties. New languages that can be wrapped by the contract language are also conceivable, e. g., expression languages that refine the data model so that contracts can be made that allow, disallow, or enforce specific operations on the data.

References

[1] George Arthur Akerlof. The market for “lemons”: Quality uncertainty and the market mechanism.The Quarterly Journal of Economics, 84(3):488– 500, 1970.

[2] Anne Anderson. A comparison of two privacy policy languages: EPAL and XACML. Technical report, Sun Microsystems Laboratories, 2005. [3] Kenneth Joseph Arrow. Uncertainty and the welfare economics of

(27)

14 Introductory Summary

[4] Paul Ashley, Satoshi Hada, Günter Karjoth, Calvin Powers, and Matthias Schunter. Enterprise privacy authorization language (EPAL 1.2). Member submission, W3C, 2003.

[5] Victoria Bellotti and Abigail Sellen. Design for privacy in ubiquitous computing environments. In Giorgio de Michelis, Carla Simone, and Kjeld Schmidt, editors,Proceedings of the third conference on European Conference on Computer-Supported Cooperative Work (ECSCW’93), pages 77–92. Kluwer Academic Publishers, 1993.

[6] Oliver Berthold. Effiziente Realisierung von Dummy Traffic zur Gewähr-leistung von Unbeobachtbarkeit im Internet[An efficient implementa-tion of dummy traffic to ensure unobservability on the Internet]. Dip-loma thesis, Technische Universität Dresden, Faculty of Computer Sci-ence, Institute for Theoretical Computer SciSci-ence, 1999.

[7] Oliver Berthold and Hannes Federrath. Identitätsmanagement. E-Privacy: Datenschutz im Internet (DuD-Fachbeiträge), pages 189–204, 2000. [8] David Chaum. Untraceable electronic mail, return addresses, and digital

pseudonyms.Communications of the ACM, 24(2):84–90, 1981.

[9] David Chaum. The dining cryptographers problem: Unconditional sender and recipient untraceability.Journal of Cryptography, 1(1):65–75, 1988.

[10] Sebastian Clauß. A framework for quantification of linkability within a privacy-enhancing identity management system. In Günter Müller, editor, Emerging Trends in Information and Communication Security (ETRICS), volume 3995 ofLecture Notes in Computer Science, pages 191–205, Berlin Heidelberg, 2006. Springer.

[11] Sebastian Clauß and Marit Köhntopp. Identity management and its support of multilateral security. Computer Networks, 37(2):205–219, 2001.

[12] Lorrie Faith Cranor, Marc Langheinrich, and Massimo Marchiori. A P3P preference exchange language 1.0 (APPEL1.0). Working draft, W3C, 2002.

[13] Lorrie Faith Cranor, Marc Langheinrich, Massimo Marchiori, Martin Presler-Marshall, and Joseph Reagle. The platform for privacy preferences 1.0 (P3P1.0) specification. Recommendation, W3C, 2002.

[14] George Danezis. Statistical disclosure attacks: Traffic confirmation in open environments. In Stefano Gritzalis, Sabrina De Capitani di Vimer-cati, Pierangela Samarati, and Georgios Katsikas, editors,Proceedings of Security and Privacy in the Age of Uncertainty (SEC2003). Kluwer, 2003.

(28)

Towards Inter-temporal Privacy Metrics 15

[15] George Danezis and Claudia Díaz. A survey of anonymous communica-tion channels. Technical Report MSR-TR-2008-35, Microsoft Research, 2008.

[16] Dorothy Elizabeth Denning, Peter James Denning, and Mayer Dlugach Schwartz. The tracker: A threat to statistical database security. ACM Transactions on Database Systems, 4(1):76–96, 1979.

[17] Claudia Díaz and Bart Preneel. Taxonomy of mixes and dummy traffic. InProceedings of I-NetSec04: 3rd Working Conference on Privacy and Anonymity in Networked and Distributed Systems. Springer, 2004. [18] Claudia Díaz, Stefaan Seys, Joris Claessens, and Bart Preneel. Towards

measuring anonymity. In Paul Syverson and Roger Dingledine, editors, Workshop on Privacy Enhancing Technologies, volume 2482 of LNCS.

Springer, 2002.

[19] Howard Sylvester Ellis and William Fellner. External economies and diseconomies.The American Economic Review, 33(3):493–511, 1943. [20] Simone Fischer-Hübner. IT-security and privacy: Design and use of

privacy-enhancing security mechanisms, volume 1958 of LNCS. Springer, Berlin Heidelberg, 2001.

[21] Marit Hansen, Peter Berlich, Jan Camenisch, Sebastian Clauß, Andreas Pfitzmann, and Michael Waidner. Privacy-enhancing identity manage-ment.Information Security Technical Report, 9(1):35–44, 2004.

[22] Marit Hansen and Henry Krasemann. Privacy and identity management for europe – prime white paper. Project Deliverable D15.1.d, Privacy and Identity Management for Europe (PRIME, Project within the European Community’s 6th Framework Program, No. 507591), 2005.

[23] Mireille Hildebrandt. Behavioural biometric profiling and transparency enhancing tools. Project Deliverable 7.12, Future of Identity in the Information Science (FIDIS, Network of Excellence within the European Community’s 6th Framework Program, No. 507512), 2009.

[24] Xiaodong Jiang, Jason Hong, and James Landay. Approximate informa-tion flows: Socially-based modeling of privacy in ubiquitous computing. In Gaetano Borriello and Lars Holmquist, editors,UbiComp 2002: Ubi-quitous Computing, volume 2498 of LNCS, pages 176–193. Springer, 2002. [25] Eleni Kosta and Jos Dumortier. Contextual framework. Project Deliver-able D2.3, Privacy and Identity Management for Community Services (PICOS, Project within the European Community’s 7th Framework Program, No. 215056), 2008.

[26] Nick Mathewson and Roger Dingledine. Practical traffic analysis: Ex-tending and resisting statistical disclosure. In David Martin and An-drei Serjantov, editors,Proceedings of the 4th International Workshop on

(29)

16 Introductory Summary

Privacy-Enhancing Technologies 2004, volume 3424 of LNCS, pages 17–34. Springer Berlin/Heidelberg, 2005.

[27] Tim Moses. eXtensible Access Control Markup Language (XACML) version 2.0. Standard, OASIS, 2005.

[28] Tim Moses. Privacy policy profile of XACML v2.0. Standard, OASIS, 2005.

[29] PETWebII partners. Privacy-respecting identity management for e-Norge (PETWebII, funded by the research council of Norway in the VERDIKT program, no. 193030). http://petweb2.projects.nislab.no/. last accessed: April 04, 2011.

[30] PrimeLife partners. Primelife: Bringing sustainable privacy and iden-tity management to future networks and services. Project Deliverable D3.1.2, Privacy and Identity Management in Europe for Life (PrimeLife, Project within the European Community’s 7th Framework Program, No. 216483), 2008.

[31] Andreas Pfitzmann and Marit Hansen. Anonymity, unobservability and pseudonymity – a proposal for terminology. InDesigning Privacy Enhancing Technologies, Proceedings of workshop on Privacy-Enhancing Technology (PET) 2000, volume 2009 of LNCS, pages 1–9. Springer, 2001. [32] Andreas Pfitzmann and Marit Hansen. Anonymity, unlinkability, undetectability, unobservability, pseudonymity, and identity manage-ment – A consolidated proposal for terminology. http://dud.inf. tu-dresden.de/Anon_Terminology.shtml, 2010. (Version 0.34). [33] György (George) Pólya. How to Solve It: A New Aspect of Mathematical

Method. Princeton University Press, 2nd edition, 1957.

[34] Karl Raimund Popper. The Logic of Scientific Discovery. Hutchinson, London, 1959.

[35] Dave Raggett. Draft 2nd design for policy languages and protocols. Project Heartbeat 5.3.2, PrimeLife project of the European Community’s 7th Framework Program, No. 216483, 2009.

[36] Joseph Reagle and Lorrie Faith Cranor. The platform for privacy prefer-ences.Communications of the ACM, 42(2):48–55, 1999.

[37] Ron Rivest, Adi Shamir, and Leonard Adleman. A method for obtaining digital signatures and public-key cryptosystems.Communications of the ACM, 21(2):120–126, 1978.

[38] Jan Schlörer. Zum Problem der Anonymität der Befragten bei stat-istischen Datenbanken mit Dialogauswertung[On the problem of re-spondents’ anonymity in statistical databases with dialogue analysis]. In D. Siefkes, editor,4. GI-Jahrestagung, volume 26 of LNCS, pages 502–511. Springer, 1975.

(30)

Towards Inter-temporal Privacy Metrics 17

[39] Andrei Serjantov and George Danezis. Towards an information theoretic metric for anonymity. In Paul Syverson and Roger Dingledine, editors, Workshop on Privacy Enhancing Technologies, volume 2482 of LNCS, pages 41–53. Springer, 2002.

[40] Claude Elwood Shannon. A mathematical theory of communications. Bell System Technical Journal, 27:379–423, 623–656, 1948.

[41] Sandra Steinbrecher and Stefan Köpsell. Modelling unlinkability. In Roger Dingledine, editor,Workshop on Privacy Enhancing Technologies, volume 2760 ofLNCS, pages 32–47. Springer, 2003.

[42] Latanya Sweeney. k-anonymity: A model for protecting privacy. Inter-national Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):571–588, 2002.

[43] Hal Ronald Varian. Internet Policy and Economics – Challenges and Perspectives, chapter Economic Aspects of Personal Privacy, pages 101– 109. Springer, 2009. Published in Privacy and Self-Regualtion in the

Information Age, a report issued by the NTIA, 1997.

[44] Gilbert Standford Vernam. Secret signaling system. United States Patent Office, 1919. Patent no. 1,310,719.

(31)

Karlstad University Studies

ISSN 1403-8099 ISBN 978-91-7063-357-7

Towards Inter-temporal

Privacy Metrics

Informational privacy of individuals has significantly gained importance after information technology has become widely deployed. Data, once digitalised, can be copied and distributed at negligible costs. This has dramatic consequences for individuals that leave traces in the form of personal data whenever they interact with information technology. The right of individuals for informational privacy, in particular to control the flow and use of their personal data, is easily undermined by those controlling the information technology.

The objective of this thesis is to study the measurement of informational privacy with a particular focus on scenarios where an individual discloses personal data to a second party, the data controller, which uses this data for re-identifying the individual within a set of others, the population. Several instances of this scenario are discussed in the appended papers, most notably one which adds a time dimension to the scenario for modelling the effects of the time passed between data disclosure and usage. This extended scenario leads to a new framework for inter-temporal privacy metrics.

The common dilemma of all privacy metrics is their dependence on the information available to the data controller. The same information may or may not be available to the individual and, as a consequence, the individual may be misguided in his decisions due to limited access to the data controller’s information when using privacy metrics. The goal of this thesis is thus not only the specification of new privacy metrics, but also the contribution of ideas for mitigating this dilemma. However, a solution will rather be a combination of technological, economical and legal means than a purely technical solution.

References

Related documents

I regleringsbrevet för 2014 uppdrog Regeringen åt Tillväxtanalys att ”föreslå mätmetoder och indikatorer som kan användas vid utvärdering av de samhällsekonomiska effekterna av

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

I dag uppgår denna del av befolkningen till knappt 4 200 personer och år 2030 beräknas det finnas drygt 4 800 personer i Gällivare kommun som är 65 år eller äldre i

Det har inte varit möjligt att skapa en tydlig överblick över hur FoI-verksamheten på Energimyndigheten bidrar till målet, det vill säga hur målen påverkar resursprioriteringar

Detta projekt utvecklar policymixen för strategin Smart industri (Näringsdepartementet, 2016a). En av anledningarna till en stark avgränsning är att analysen bygger på djupa

According to obtained numerical results we can say that the Finite Vol- ume Method has the least error comparing with other mentioned methods, the box scheme of Keller and

We will consider an existing and widely accepted electricity price process model, use the finite volume method to formulate a numerical scheme in order to calibrate the prices of

Industrial Emissions Directive, supplemented by horizontal legislation (e.g., Framework Directives on Waste and Water, Emissions Trading System, etc) and guidance on operating