Characterizing the HTTPS Trust Landscape : - A Passive View from the Edge

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Master thesis, 30 ECTS | Datateknik

2019 | LIU-IDA/LITH-EX-A--19/079--SE

Characterizing the

HTTPS Trust Landscape

–

A Passive View from the Edge

Karaktärisering av HTTPS Förtroende-Landskap

Gustaf Ouvrier

Supervisor : Niklas Carlsson, Martin Arlitt Examiner : Niklas Carlsson

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

Our society increasingly relies on the Internet for common services like online banking, shopping, and socializing. Many of these services heavily depend on secure end-to-end transactions to transfer personal, financial, or other sensitive information. At the core of ensuring secure transactions are the TLS/SSL protocol and the “trust” relationships between all involved partners. In this thesis we passively monitor the HTTPS traffic between a campus network and the Internet, and characterize the certificate usage and trust relationships in this complex landscape. By comparing our observations against known vulnerabilities and problems, we provide an overview of the actual security that typical Internet users (such as the people on campus) experience. Our measurements cover both mobile and stationary users, consider the involved trust relationships, and provide insights into how the HTTPS protocol is used and the weaknesses observed in practice.

(4)

List of Figures

1.1 Relationships and involved parties. . . 1

2.1 Hash function. . . 5

2.2 Symmetric-key cryptography. . . 6

2.3 Message authentication code. . . 7

2.4 Asymmetric-key cryptography. . . 8

2.5 Digital signature scheme. . . 8

2.6 Man in the middle attack scenario. . . 9

2.7 Full TLS handshake. . . 11

2.8 Abbreviated TLS handshake. . . 13

2.9 Simplified scenario of PKIX in TLS protocol. . . 14

2.10 Certificate landscape example. . . 15

2.11 Schematic view of X.509 version 3 certificate format. . . 16

2.12 Chrome browser certificate validation indicator. . . 17

3.1 Log file processing. . . 21

4.1 Number of established sessions plotted over time. Shows total number of sessions as well as the subsets of sessions using HTTP and HTTPS. . . 23

4.2 Cumulative distribution function (CDF) of certificate validity period lengths. . . . 29

4.3 Complementary cumulative distribution function (CCDF) of number domain names per certificate. . . 30

4.4 Clustered histogram showing the relation between offered and used protocol ver-sions. Each cluster shows the observed shares for the total number of sessions as well as the mobile and stationary subsets. The Y-axis is represented with a log-scale for better presentation of the small measurements. . . 33

4.5 Top-15 selected encryption ciphers by the server in the cipher suite selection pro-cess. Each cluster shows a breakdown of the observed frequency for four subsets: sessions using mobile devices, stationary devices, protocol TLSv10, and protocol TLSv12. . . 34

4.6 Top-15 offered encryption ciphers by the client in the cipher suite selection pro-cess. Each cluster shows a breakdown of the observed frequency for four subsets: sessions using mobile devices, stationary devices, protocol TLSv10, and protocol TLSv12. . . 35

4.7 Complementary cumulative distribution function (CCDF) of cipher suite list sizes offered by client and cumulative distribution function (CDF) of downgrades by the server. . . 43

4.8 Cumulative distribution function (CDF) of RC4 Cipher when chosen by server. . . 45

4.9 Clustered histogram of the session quality evaluation based on the four-level se-curity classification. . . 46

(7)

List of Tables

4.1 Dataset overview. . . 22

4.2 Browser share. . . 24

4.3 Chrome version distribution. . . 25

4.4 Safari version distribution. . . 25

4.5 Firefox version distribution. . . 25

4.6 Internet Explorer version distribution. . . 25

4.7 Top 10 organizations signing leaf certificates. . . 26

4.8 Top-six certificates authorities signing EV certificates. . . 26

4.9 Certificate signature algorithms grouped on type. . . 27

4.10 Certificate public key grouped on type and key sizes. . . 28

4.11 Domain name validation. . . 31

4.12 Certificate validation. . . 31

4.13 Key exchange algorithms used. . . 37

4.14 Top-10 key exchange algorithms offered. . . 37

4.15 Export-grade key exchange algorithms offered. . . 39

4.16 Encryption algorithms used. . . 39

4.17 Encryption algorithms offered. . . 41

4.18 MAC algorithms used. . . 42

4.19 MAC algorithms offered by client. . . 42

5.1 Browser usage and version distribution. . . 47

A.1 Protocol versions offered/used. Data points for Figure 4.4. . . 63

A.2 Cipher suite used. Data points for Figure 4.5. . . 64

A.3 Cipher suite offered. Data points for Figure 4.6. . . 64

A.4 Offered key exchange algorithms. Full version of Table 4.14. . . 65

A.5 List size and downgrades. The majority of the data points for Figure 4.7. . . 66

(8)

1 Introduction

1.1 Motivation

We are living in an information society in which organizations and individual users increas-ingly rely on the end-to-end security and privacy offered by the HTTPS protocol. With HTTPS, regular Hypertext Transfer Protocol (HTTP) requests and responses are securely transferred over an end-to-end connection encrypted using Transport Layer Security (TLS) or its predecessor Secure Sockets Layer (SSL).

With increased value of the information exchanged over the Internet, it is perhaps not sur-prising that HTTPS usage is increasing [21]. HTTPS can provide secure end-to-end transfers of money and other sensitive information, and is often used by authentication-based ser-vices such as online banking, shopping sites, and social networking serser-vices. With increased awareness of wiretapping and manipulation of network traffic, HTTPS has also become com-mon acom-mong services that have not traditionally used secure end-to-end connections.

(9)

1.2. Aim

The “trust” relationships between all involved parties and the TLS/SSL protocol suite are the basis for ensuring secure transactions over HTTPS. To untangle the trust relationships, consider a user (Figure 1.1) accessing a website using HTTPS. First, the user must trust the browser’s implementation of HTTPS. Additionally, a good browser must be up-to-date to protect against the latest known security vulnerabilities.

Second, the browser (and implicitly the user) needs to trust the Certificate Authorities (CAs) in the browser’s root store. If a trusted CA is compromised or mistakenly starts gen-erating certificates for non-trusted servers, this significantly compromises the security that a browser provides. Unsurprisingly, there are differences in which CAs distinct browsers select to trust and different web services select different CAs for their certificates.

Third, the browser needs to trust the server that it communicates with. This trust is often build around X.509 certificates signed by the CAs in the root store or by other trusted entities to which the CAs have delegated part of this responsibility. These chained trust relationships are further complicated by (chained) certificates often being valid for different time periods and being difficult to invalidate when trust relationships are broken.

Finally, the user must trust the cipher suite negotiated between the browser and server during the TLS handshake in the HTTPS connection establishment. Ciphers have different cryptographic strengths, and many ciphers in use today have known vulnerabilities and can therefore pose a significant risk to the confidentiality of the information transferred between the two parties.

Naturally, not all relationships are equally trustworthy. Hidden from a typical user, web sessions involve a diversity of servers, certificate authorities, certificates, and ciphers. This is why it is important to regularly check for security weaknesses in this complicated ecosystem.

1.2 Aim

The aim of this thesis is to characterize the trust landscape and the security risks observed in practice for each of the above described relationship types and discuss our findings in the context of known vulnerabilities. This goal will be achieved by passively collecting and ana-lyzing the HTTPS traffic between a campus network and the Internet. The traffic in this kind of network includes a great diversity of traffic from both mobile and stationary users, and will thereby provide an overview of the actual security that typical Internet users experience when browsing the web, as well as the trust relationships that are often invisible to the end user.

1.3 Research Questions

The aim of this thesis is formulated in the following research questions:

1. What are the most significant trust relationships in HTTPS communication and how trustworthy are they actually in practice?

2. Are there any significant differences between the security of mobile and stationary user devices in HTTPS communication?

3. What is the quality of the actual security that typical users experience when accessing the Internet using HTTPS?

1.4 Delimitations

This thesis will not evaluate the HTTPS and TLS protocols from a security perspective, but rather observe how the protocols are utilized in practice.

(10)

1.5. Contributions

1.5 Contributions

The main contribution of this thesis is a characterization of how HTTPS is used in practice, which includes a head-to-head comparison of the usage observed among mobile and station-ary users within the same campus network. Some of the major contributions of this thesis have been published in the following research articles:

1. Characterizing the HTTPS Trust Landscape: A Passive View from the Edge [23] Gustaf Ouvrier, Michel Laterman, Martin Arlitt, and Niklas Carlsson, IEEE Communica-tions Magazine, volume 55, issue 7, July 2017, pp. 36–42.

2. A first look at the CT landscape: Certificate Transparency logs in practice [15] Josef Gustafsson, Gustaf Ouvrier, Martin Arlitt, and Niklas Carlsson, In Proc. Passive and Active Measurement Conference (PAM), Sydney, Australia, Mar. 2017, pp. 87-99.

(11)

2 Theory

2.1 Security Aspects

Cryptography can provide security for many different aspects of a system. In this section, we will give a brief description of a few basic security aspects relevant to the thesis: confidential-ity, data integrconfidential-ity, and authentication.

2.1.1 Confidentiality

Confidentiality is the property whereby information/communication is made private or se-cret from unauthorized parties. Using cryptography, this is achieved through encryption. Encryption uses a key to render data unintelligible to everyone except authorized entities. Depending on the type of cryptographic algorithm, the same key or a different related key is used to decrypt the data. Decrypting the data renders the data intelligible again. In or-der to provide confidentiality, the cryptographic algorithm must be designed and imple-mented in such a way that an unauthorized party cannot determine the keys used for en-cryption/decryption or be able to derive the plaintext directly without having access to the correct keys.

2.1.2 Data Integrity

Data integrity is the property whereby unauthorized modification of informa-tion/communication can be detected. Modification of data includes: insertion, deletion, and substitution. Using Cryptography, this can be achieved through the use of digital signature, Hash, or message authentication code.

2.1.3 Authentication

Authentication is the act of confirming the identity of something or someone. When it comes to computers this is normally the identity of a specific computer system or the identity of a user. Examples of authentication in a regular person’s life are plentiful; the simple act of unlocking your smartphone using a swipe pattern or PIN code, paying in a shop with a bank card and PIN code, or logging into a web service such as Facebook with a username and password.

(12)

2.2. Cryptographic Primitives

One of the most common methods of authentication for a user is the requirement of a username and password combination when they login to a computer system or service. In this case, the username and password represents a piece of information that only an autho-rized user should know. However, there are many other ways in which authentication can be done. The different types of authentication can be divided into three separate categories, also known as factors of authentication. These factors describe the different properties that can be used in order to authenticate an entity.

• Knowledge Factor: A knowledge factor is a secret which only the authenticating party should know. They are often passwords, PIN-codes or answers to personal questions. • Ownership Factor: An ownership factor is something that only the authenticating party

has access to. In computer systems this can for instance be a smart card or it could be a certificate used to authenticate web servers. They can be represented in many ways using many types of technology.

• Inherence Factor: An inherence factor should describe something the authenticating party is or does. For a user/person this usually means biometrics such as fingerprints or retina patterns, but it could also be a behavioral trait such as the way someone writes their signature.

2.2 Cryptographic Primitives

Cryptographic primitives can be seen as the basic building blocks of cryptographic systems. When designing a cryptographic system it is rarely built completely from scratch. This is both because it is very difficult to design cryptographically secure algorithms and because analyzing and verifying that an algorithm is secure is much harder than verifying that it works. Instead, well established concepts and methods are often reused in order to create a cryptographic system that has the required properties (not necessary to reinvent the wheel every time you design a new car). In this section the most widely used primitives are briefly described.

2.2.1 Cryptographic Hash Functions

A cryptographic hash function is a cryptographic construct used in many information secu-rity applications and protocols such as digital signatures, integsecu-rity checking, and authenti-cation. Just like a normal hash function, it takes a message M of any length as input and creates a short, fixed length hash value, also known as digest D, as shown in Figure 2.1. The hash value can be seen as a fingerprint of the input data. Unlike a normal hash function the cryptographic hash function should also have the following properties:

Figure 2.1: Hash function.

• Pre-image resistance (One-Way): It should be easy to compute the hash value but com-putationally hard to recreate the original message from its hash value. I.e., given D it should be difficult to find M such as D=H(M).

• Weakly collision-free: Given a message and its corresponding hash value it should be computationally hard to find another different message which applying the hash

(13)

function would result in the same hash value. I.e., given M1 it should be difficult to find M2 such as M1 ‰ M2 where H(M1) = H(M2).

• Strongly collision free: It should be computationally hard to find any two messages where applying the hash function would result in the same hash value. I.e., it should be difficult to find any pair M1 ‰ M2 where H(M1) = H(M2).

The last two properties are similar, with the third being more strictly defined. Not all cryptographic hash functions fulfill the third property. These properties ensure that digests created by a hash function can be trusted and that the hash function cannot be used to mali-ciously create or modify messages to have specific digests. Hash functions with these prop-erties are considered cryptographic hash functions.

Common cryptographic hash functions include:

• MD5: Was once a widely used hash function, but is now considered cryptographically broken and usage is highly discouraged.

• SHA-1: Designed by NSA and published by the National Institute of Standards and Technology (NIST). Although cryptographically weak it is still widely used. It is being phased out in favor of SHA-2.

• 2: The successor to 1 and is the current recommendation by NIST. The 2 family consist of six hash functions: 224, 256, 384, 512, SHA-512/224, and SHA-512/256.

• SHA-3: The newest SHA standard. It was the result of a competition that ended in 2012.

2.2.2 Symmetric-Key Cryptography

Symmetric-key cryptography, also known as private key cryptography, refers to crypto-graphic algorithms that use a single shared secret to perform both encryption and decryption. Figure 2.2 shows a typical scenario of using symmetric-key cryptography. Alice uses the key K to encrypt the message M creating cipher C. On the other end, Bob uses the same key K to decrypt the cipher C recreating M.

Figure 2.2: Symmetric-key cryptography.

Symmetric-key algorithms can be implemented in two different ways; as either stream or block ciphers. The difference between the two is how they process the data. Block ciphers work with fixed length blocks of data and encrypt/decrypt each block as a whole, while stream ciphers process individual bits or bytes one at a time.

Cryptographic algorithms based on symmetric keys are in general fast and computation-ally cheap compared to other types, and are therefore the most common choice for encrypting the bulk data sent of communication channels. However, the reason for its speed is also its main drawback. Symmetric-key algorithms depend on the existence of a shared secret, but do not define how it can be established. Securely sharing a secret between two remote parties is a very difficult problem. Communication sent over a network such as the Internet can be intercepted and modified and it can therefore not be trusted to directly transfer a secret. The

(14)

secret can of course be agreed upon beforehand, but that is not feasible in most situations. Different ways of establishing a key over an insecure channel is described in Section 2.2.5.

Common symmetric-key cryptographic algorithms include:

• RC4: Is one of the most popular stream ciphers. It is favoured due to its simplicity and speed. However, it has arguable weaknesses and has been repeatedly shown to be easy to implement in very insecure ways, as shown by WEP [13]. Recent research has also shown more vulnerabilities [2, 17, 22] and RFC 7465 [24] deprecates its use in TLS. • Data Encryption Standard (DES): DES was once the predominant standard but has

now been withdrawn as a standard by NIST. The use of all older versions of DES are highly discouraged and the strongest variant 3DES is only recommended for legacy systems for compatibility purposes.

• Advanced Encryption Standard (AES)-128, 192, 256, 384: AES is the established NIST standard. It is still considered strong encryption. NIST recommends AES-256 for new systems. AES is by itself a block cipher, but there are optional modes like the Cipher Block Chaining (CBC) mode which allows it to function as a stream cipher.

2.2.3 Message Authentication Code

A Message Authentication Code (MAC) algorithm takes both an arbitrarily long message and a secret key as input to create a short, fixed length value. This value is called the MAC and is attached to the message to protect its data integrity as well as ensure its authenticity.

Figure 2.3: Message authentication code.

Figure 2.3 shows a typical scenario of using MAC. Alice uses the key K to create a MAC of the message M. Both message M and MAC is sent. On the other end, Bob uses the key K to recreate the MAC in the same way as Alice did. Bob can then compare the two MACs and verify the authenticity and integrity of the message M.

Since the MAC algorithm requires a shared secret key, every party that knows this secret can both create and verify the MAC. This means that a MAC can only be used to verify that the message has been transferred without modification from a party that also knows the secret key. In cases with many involved parties, a MAC cannot be used to verify which specific party created it. The use of a shared secret also means that MAC algorithms have the same key-distribution problems as symmetric-key cryptography.

A common method of implementing MAC algorithms is to utilize cryptographic hash functions. This type of MAC is called an HMAC and is specified in RFC 2104 [18].

2.2.4 Asymmetric-Key Cryptography

The difference between symmetric- and asymmetric-key cryptography is that while symmetric-key cryptography uses the same key for both encryption and decryption, asymmetric-key cryptography uses two different but mathematically related keys. The keys are referred to as the private key that is to be kept secret, and the public key that can be freely distributed to anyone. The existence of the public key is also why asymmetric-key

(15)

cryptography is also commonly known as public key cryptography. Asymmetric-key cryp-tography algorithms require a significantly higher amount of computation power and are therefore slow in relation to symmetric-key ciphers. This is why they are mostly used during connection establishment for authentication and key exchange purposes.

Asymmetric-key cryptography can mainly be used in two different ways, either as a public-key encryption system or a digital signature scheme.

Figure 2.4: Asymmetric-key cryptography.

In a public-key encryption system the public key is used to encrypt a message that thence-forth can only be decrypted using the paired private key. This allows anyone in possession of the public key to send encrypted messages to the private key holder. Figure 2.4 shows a typical scenario of using a public-key encryption system. Alice uses the public part of the key K to encrypt the message M creating cipher C. On the other end, Bob uses the private part of the key K to decrypt the cipher C recreating M.

Figure 2.5: Digital signature scheme.

In contrast, a digital signature scheme works the other way around. The private key holder creates a digital signature by signing a message. The signature can then be verified using the paired public key. Figure 2.5 shows a typical scenario of using a digital signature scheme. Alice uses the private part of the key K to create a signature S of the message M. Both message M and signature S is sent. On the other end, Bob uses the public part of the key K to verify that signature S was indeed created from message M using the paired private key by comparing the result with message M. Just like a MAC, a digital signature ensures both the integrity and the authenticity of the message. Due to the use of asymmetric-key cryptography, it is possible to determine exactly who signed the message and therefore also establish non-repudiation. Digital signatures are central to many security schemes such as Public Key Infrastructures (PKI).

Common asymmetric-key cryptography algorithms include:

• RSA: One of the first and still most popular asymmetric-key cipher. There are variants both for public-key encryption and digital signature systems. NIST currently recom-mends a minimum key size of 2048-bit. Smaller key sizes are discouraged.

• ElGamal: Another asymmetric-key cipher that is based on the Diffie-Hellman key ex-change. There exist variants both for public-key encryption and digital signature sys-tem as well as variants based on Elliptic Curve Cryptography (ECC). NIST currently recommends a minimum key size of 224-255-bit.

(16)

• Digital Signature Algorithm (DSA): Also referred to as NIST’s Digital Signature Stan-dard (DSS). DSA is a digital signature scheme based on ElGamal and also exists in a ECC variant. NIST currently recommends a minimum key size of 2048-bit and 224-255-bit for the ECC variant.

2.2.5 Key Exchange

For symmetric-key ciphers and MACs to work, both communicating parties must be in pos-session of a shared secret key. This can be accomplished by agreeing upon the key beforehand or by using a secure side channel (for example, a courier). However, this is not a feasible op-tion for online situaop-tions where two previously unknown entities want to communicate with each other. It is in such situations where key exchange algorithms come into play. A key ex-change algorithm allows two parties to collectively establish a shared secret over an insecure channel without it being exposed to a listening third party.

Common key exchange algorithms include:

• RSA/DSA key exchange process: Based on a public-key encryption system. One party sends its public key to the other, which in turn uses it to respond with an encrypted message containing the newly generated secret key.

• Diffie-Hellman key agreement: Invented in 1976 by Whitfield Diffie and Martin Hell-man. It is not based on encryption and decryption, but instead relies on mathematical functions that enable the two parties to separately arrive at the same key by sending each other parts that are based on generated random values. Even though a third party can see both transmitted parts they cannot derive the secret key without any of the random values.

In the ideal situation, where both parties are communicating directly with each other and are who they say they are, key exchange algorithms work very well. However, what if a mali-cious third party intercepts the messages and completes the key exchange separately with the two original parties. The attacker can then continue to secretly relay, alter and possibly inject data into the communication between the two parties who still believe they are communicat-ing directly with each other. Such a scenario is commonly known as a man-in-the-middle (MITM) attack. In the context of key exchange the issue of authenticating the other party is called the identity authentication problem.

Figure 2.6: Man in the middle attack scenario.

Figure 2.6 shows a typical scenario of such a MITM attack. Alice attempts to perform a key exchange with Bob, but the messages are intercepted by the MITM who , in Alice’s place, initiates the key exchange with Bob using its own parameters. The MITM completes the

(17)

2.3. Transport Layer Security

key exchanges separately with both Alice and Bob, and can thenceforth freely decrypt and encrypt any communication sent between them.

Solutions to the identity authentication problem are typically based on one of two con-cepts: either a centralized authority solution, e.g., Public Key Infrastructure (PKI), or a web of trust where the responsibility is spread out between all users, e.g., OpenPGP [8].

2.3 Transport Layer Security

Transport Layer Security (TLS) and its predecessor Secure Socket Layer (SSL) are crypto-graphic protocols designed to provide privacy, data integrity and mutual authentication be-tween two communicating applications on the Internet, typically a client and a server. The protocols operate directly on top of a reliable transport layer protocol, almost always TCP, and support a large number of popular application layer protocols including HTTPS, Inter-net Message Access Protocol (IMAP), Simple Mail Transfer Protocol (SMTP) and Extensible Messaging and Presence Protocol (XMPP).

2.3.1 History

SSL/TLS are protocols with long histories. SSL was originally developed by Netscape in 1995 along with their flagship browser Netscape Navigator. The first SSL version was never publicly released, but it is known to have had serious security issues. Their second attempt, SSL 2.0, now deprecated [32], was also rapidly discarded due to a series of severe weaknesses. The first successful version, SSL 3.0 [14] in 1996, was a major rework and fixed a number of conceptual flaws in the earlier versions. Security improvements upon SSL 2.0included the addition of support for SHA-1 based ciphers and certificate authentication. SSL 3.0 is now also a deprecated protocol [19].

Further development and maintenance of the protocol was handed over to the Internet Engineering Task Force (IETF), which renamed it TLS to avoid legal issues. TLS 1.0 [3] was published in 1999 and is typically seen as a minor editorial update to SSL 3.0. However, the differences were significant enough to preclude interoperability with earlier versions. From a security standpoint, TLS 1.0 was more desirable than SSL 3.0 because TLS 1.0 added SHA-1 support while SSL 3.0 depended on the weak MD5 hash function for master key derivation.

TLS 1.1 [10] was published in 2006 and mainly added protection against CBC attacks and support for IANA registration parameters.

TLS 1.2 [11] was published in 2008 and included significant improvements like support for GCM and CCM modes of AES and the use more secure hash functions.

The latest version TLS 1.3 [25] was published in 2018.

2.3.2 Record Protocol

TLS is a layered protocol. Inside a TLS connection all messages are sent using the TLS record protocol. This protocol functions as an intermediary layer between the TCP connection and the sub-protocols of TLS. The record protocol defines the format for how messages are to be framed and it performs the operations that maintains the secure channel. A message using this format is called a TLS record. A record always contains information about content type, version and length of the record. Inside the record, messages from sub-protocols are con-tained and if a secure connection has been established a MAC is added. If a block cipher is used then potentially some extra padding is also added.

When messages are transmitted, the record protocol performs the following four opera-tions:

• Fragmentation: Messages are divided into blocks smaller than 214 bytes (16KB) and multiple messages of the same type are potentially coalesced into a single record.

(18)

• Compression: Optional compression of messages is performed.

• Message Authentication: A MAC is created using the HMAC algorithm and appended to the record.

• Encryption: The negotiated cipher is used to encrypt the message and MAC.

On the receiving end of the communication channel, the inverse operations are performed in the reverse order to recover the messages. The actual keys, cryptographic hash function and encryption method used to secure the record protocol is agreed upon during the hand-shake and therefore the initial messages of a TLS session are sent in the clear.

2.3.3 Handshake Protocol

The handshake protocol operates on top of the record protocol and is used to establish a SSL/TLS session. The goal of the handshake is to agree upon which protocol version to use, which cryptographic algorithms to employ, cryptographic keys, and compression methods. During the handshake both parties also have the possibility to authenticate each other, al-though in a client-server situation typically only the server is ever authenticated.

Figure 2.7: Full TLS handshake.

The handshake consists of several messages sent back and forth between the two parties. Figure 2.7 illustrates the full handshake between a typical client and a server. The full hand-shake is generally initiated by the client sending a ClientHello message that states the client’s intention to use SSL/TLS.

• ClientHello: Contains prevalence ordered lists of supported protocol versions, crypto-graphic algorithms, and compression methods. It also contains the “Client Random” (a nonce), and optional extensions such as Server Name Indication. In the case of session resumption the client can send a previously used session ID to resume the session (see abbreviated handshake below).

In SSL/TLS the different combinations of cryptographic algorithms are called cipher suites and are identified by a unique 16-bit symbolic string. For example, the cipher suite

(19)

DHE_RSA_WITH_AES_256_CBC_SHA means that the record protocol will use HMAC-SHA1 and AES encryption in Cipher Block Chaining mode with a 256-bit key, and the key exchange is performed using standard Diffie-Hellman. Initially the cipher suite and compres-sion method is set to null, which means no encryption or no comprescompres-sion.

If the server finds a combination of protocols and algorithms that it also supports, it will send the ServerHello message in response. Also, depending on the chosen cipher suite sev-eral other messages may additionally be sent.

• ServerHello: Contains the server chosen protocol version, session ID, cipher suite, and compression methods, as well as the “Server Random” (a nonce), and optional exten-sions that will be used for the connection.

• Certificate: An optional message that is used for server authentication and is therefore almost always sent since the server is rarely not authenticated. It contains the server’s certificate, which in turn contains the server public key (more details in the Certificate and Authentication section).

• ServerKeyExchange: An optional message only used for the more complicated Key Exchange algorithms, e.g., Diffie-Hellman.

• CertificateRequest: An optional message that the server can send to request that the client also needs to authenticate itself. Contains a list of “Root Certificates” that the server will use to check the validity of the client’s certificate.

• ServerHelloDone: Is a marker message that indicates that the server will not send any more messages, and the client can proceed.

The client then responds with:

• Certificate: An optional message that is used for client authentication and is only sent in response to the server’s CertificateRequest message. It contains the client’s certificate. • ClientKeyExchange: This message is always sent and contains the client part of the

actual key exchange. The message-content depends on the negotiated cipher suite. • CertificateVerify: An optional message that contains a digital signature computed over

all previous handshake messages. This message is only sent in response to the server’s CertificateRequest message. Its purpose is to prove to the server that the client really owns the public key in the certificate sent.

• ChangeCipherSpec: This message is actually not a handshake message, but has its own message type: change_cipher_spec and will therefore be sent in its own record. Its content is purely symbolic and signals that the client will from now on start to encrypt the messages using the negotiated settings.

• Finished: This message contains a cryptographic checksum computed over all the pre-vious handshake messages. Since it is sent after the ChangeCipherSpec message it is encrypted with the negotiated cipher suite and keys. The purpose of the message is to protect against alterations and to serve as proof that the server has talked to the same client all along.

The server completes the handshake by sending its own ChangeCipherSpec and Finished messages. At this point the client and server can begin exchanging other types of messages.

The full handshake must be performed for all new connections, but if the client recently connected to the target server and would like to reuse the same parameters the abbreviated handshake can be used. In the case of an abbreviated handshake the client will send the previously used session ID in the ClientHello message. If the server also remembers the

(20)

2.4. Public Key Infrastructure

Figure 2.8: Abbreviated TLS handshake.

parameters it may send the same session ID back in the ServerHello message and then move on directly to the ChangeCipherSpec and Finished messages as shown in Figure 2.8. The main advantages of using the abbreviated handshake are the fewer number of messages required and the fact that there are no costly asymmetric cryptographic computations. This greatly reduces the overall latency and is therefore frequently used by modern browsers and servers. A handshake is not required to be initiated by the client or limited to only a single hand-shake per connection. At any time, within an established SSL/TLS connection, the client can send a new ClientHello message or the server can send a HelloRequest message to initiate a new handshake. A typical scenario where this functionality is used is when the server after seeing the full request path requires that the client authenticates itself.

2.3.4 Change Cipher Spec Protocol

The change cipher spec protocol is a minimal protocol that is used to signal changes in ci-pher strategies. The protocol consists of a single message containing a type that has a fixed value. In a standard handshake the protocol is used before the finished message. It indicates that following messages will be secured using a new cipher suite and keys that have been negotiated.

2.3.5 Application Protocol

The application data protocol is used for sending the application data between the communi-cating parties after the connection has been established. The protocol simply frames the data by providing type, version and the message length.

2.3.6 Alert Protocol

The alert protocol is used when the connected parties need to notify each other of problems during a connection. Alert messages can be sent at anytime. The alert message contains a level and a description. The level informs of the problems severity and can be either warning or fatal. A fatal alert always leads to the closing of the connection

2.4 Public Key Infrastructure

The authentication process performed on servers during the TLS Handshake is done utiliz-ing a Public Key Infrastructure (PKI) built with the ITU-T standard X.509, which the IETF

(21)

adopted for use in TLS. The IETF specifies their X.509 PKI (referred to as PKIX) in the stan-dard RFC 5280 [9] and its update RFC 6818 [33].

2.4.1 Certificate Authorities

The PKIX essentially consists of trusted third parties, called Certificate Authorities (CAs), who are organizations responsible for issuing digital Certificates and administering their va-lidity. In the PKIX arrangement the CA agrees to vouch for the identity of the Server by issuing a certificate, which essentially is a cryptographic binding of the Server’s identity and public key. The binding is achieved using a digital signature scheme where the CA holds the private key and distributes the related public key to clients. If the Client trusts the CA then the server can authenticate itself to the client by providing the certificate, which the Client can use the CA’s public key to verify.

Figure 2.9: Simplified scenario of PKIX in TLS protocol.

Figure 2.9 shows a simplified usage scenario of the PKIX in the TLS/SSL handshake pro-tocol. The Server sends a message to the CA requesting that the CA vouches for the Server’s identity by issuing a certificate. The CA performs identity checks, validating the server iden-tity, before continuing to issue the certificate. Now a Client opens a connection to the server and initiates the TLS/SSL handshake. The server responds and provides the certificate. The Client then uses the CAs Public Key to verify the validity of the certificate and checks that the certificate’s identity matches the identity of the server it wants to connect to. If the au-thentication is successful the Client can use the public key contained inside the certificate to perform the key exchange and complete the handshake.

In the PKIX landscape, a strict tree-like hierarchy is assumed where the CAs reside at the top. Each CA controls a handful of authority certificates, which are certificates with the privilege to issue further certificates, vouched for by them directly. These certificates are called root certificates and are used to issue further certificates, which can be either another authority certificate, called an intermediate certificate, or an end-entity certificate, called a leaf certificate. Every new intermediate certificate can issue further intermediate certificates resulting in a chain of trust, referred to as a certificate chain or path.

Figure 2.10 shows a simple hypothetical certificate landscape example. R1, R2 and R3 are root certificates to the respective CAs, while I1, I2, I3 and I5 are intermediate certificates directly signed by the root certificates. L1-L8 are end-entities e.g., web sites, vouched for by

(22)

Figure 2.10: Certificate landscape example.

the corresponding CAs. The case of I3 signing I4 is called a cross signing and is useful in situations where one CA resides in the Root store while the other does not.

The Root Store is a browser-dependent selection of Root Certificates that are automatically trusted when installing the browser in question. In the example only R1 and R2 reside in the Root Store so a website with certificates L1-L6 will be considered trusted by the browser while L7-L8 are untrusted. The reason for the Root store is to relieve common users, who might not even know about certificates, from having to specify which CAs they trust themselves. The pre-specified Root store is not locked and a user can remove or add trusted CAs at will.

The use of intermediate certificates has its advantages. From a security standpoint it is preferable to keep the private key of a root certificate offline and keep an easily accessible directly signed intermediate certificate for online business signing purposes. It also helps to spread the workload of identity checking and signing procedures to so-called intermediate certificate authorities, especially for globally operating CAs who can delegate such tasks to local authorities. However, each intermediate certificate can be used to issue a valid certificate for any domain, so consequently every new intermediate certificate increases the number of points for attacks. If even a single CA or a subordinate intermediate, trusted by a Root store, is compromised the whole system becomes vulnerable. The attacker can issue valid certificates for any web site, circumventing the identity authentication, and consequently perform MITMAs despite of the whole PKIX implementation.

2.4.2 X.509 Certificate

The certificate used in the PKIX is structured according to the X.509 Certificate standard spec-ified in RFC 5280 [9]. Figure 2.11 shows a schematic view of the certificate format.

• Version: Describes the version of the certificate (1, 2, or 3). Depending on version not all fields are present.

• Serial Number: Is a positive integer assigned by the CA to each certificate. Together with the issuer name, the serial number uniquely identifies the certificate.

• Signature Algorithm Identifier: Contains the identifier of the algorithm used by the CA to sign the certificate.

(23)

2.5. Validating Certificate

Figure 2.11: Schematic view of X.509 version 3 certificate format.

• Validity: Contains the time interval during which the certificate is valid.

• Subject: Identifies the entity associated with the public key stored in the certificate. • Subject Public Key Info: Contains the value of the public key bound to the subject

identity and the identifier of the algorithm with which the key is used.

• Issuer- and Subject Unique Identifier: Are used in situations where the issuer and/or subject names are reused over time. The fields are optional and requires certificate version 2 or 3.

• Extensions: Contains a sequence of possible certificate extensions. This field is optional and requires certificate version 3.

• Signature: Contains the identifier of the algorithm used to sign this certificate and the actual signature value. It must be the same identifier as in the Signature Algorithm Identifier field.

2.5 Validating Certificate

In the previous section we briefly outlined how a certificate is validated, but did not get into much detail about what is actually done. In reality, the standardized specifications and browsers actual implementations vary in many different ways. This is because browsers, not only need to implement the functionality, but are also required to support backwards com-patibility and a wide variety of sometimes erroneous behavior by both clients and servers. Browsers also develop new features and functionality separate from the standards on their own, creating further deviations.

2.5.1 Building the Certificate Chain

After receiving a certificate in the SSL/TLS handshake, the client must build the certificate chain from end-entity certificate, through the possible intermediates, up to the root certificate. A certificate is not considered trusted unless this chain is complete, i.e., each certificate in the chain is successfully verified by the next until a trusted root certificate is reached. In the ideal

(24)

2.6. Certificate Issuance

case the server will provide any and all intermediates as well as the end-entity certificate, but in practice this is not always the case and browsers instead depend on caches and extensions to help build the chain.

2.5.2 Verifying the certificates

Along with building the certificate chain, the validity of each individual certificate in the chain must be validated.

Each certificate is only valid for a specific period, as specified by the validity field in the certificate. The validity period is specified by a start date, “not before”, and an end date “not after”. Depending on what kind of certificate it is the length of the period can vary greatly. An end-entity certificate is usually only valid for several months to a few years while a root certificate are valid for much longer, sometimes decades.

All certificates must also be checked for revocation to ensure that they have not been marked as untrusted. Certificates are revoked from use when CAs are compromised or when individual certificates are detected being misused. Mechanisms allowing for checking the revocation status of certificates include the standard Certificate Revocation List (CRL) [9] and the OCSP service [28], as well as a Google’s Certificate Transparency (CT) system [20].

Apart from checking the expiration and revocation status of the certificates, specific name and path length constraints placed by the CAs on intermediates must also be checked. If a CA has placed a path length constraint on an intermediate certificate it limits how long the chain is allowed to be below them. Similarly, a name constraint can restrict an intermediate to only be allowed to issue certificates for specific subdomains. For example, if an intermediate certificate is restricted to the subdomain “*.example.com” and a path length of zero, it is not allowed to issue certificates for “*.other.com” or any further intermediates.

2.6 Certificate Issuance

One of the more important aspects of using certificates is the issuance process. To achieve the necessary trust in online transactions, the CAs are required to thoroughly investigate the identity of the server before issuing a certificate. However, the increasing market demand for certificates has led some commercial CAs to introduce simpler and cheaper kinds of certifi-cates that require less stringent identity checks.

2.6.1 Domain Validation

Domain validation level certificates are the simplest types of certificates. They simply provide a baseline level of proof that you are communicating with the correct server. They are issued as soon as the CA confirms that the certificate requester is the actual owner of the target domain. The advantage of domain validation certificates is that they make secure online communication available to a wider market, but the fact that anyone can get them means that they hold no real weight.

Figure 2.12: Chrome browser certificate validation indicator.

A successfully verified domain validation level certificates is usually indicated with a “padlock” somewhere in the address bar. In the case of Google’s Chrome browser it is a green padlock right before “https...” as shown to the right in Figure 2.12.

(25)

2.6. Certificate Issuance

2.6.2 Organization Validation

Organizational validation level certificates are issued to companies and provides a higher level proof than the domain level validation. These certificates require that the ownership of the domain, as well as the company itself is verified by the CA before being issued. The advantage of organizational over domain level validation is that they not only guarantee domain ownership, but also provides a certain level of trust about the company as well. Some browsers indicate a successfully verified organization validated certificate by coloring the address bar.

2.6.3 Extended Validation

Extended Validation (EV) certificates were introduced in 2007 as an initiative to provide a high standard certificate for organizations where secure communication is essential for the business, e.g., online banking, and to some degree restore the waning user trust in a certifi-cate.

To obtain an EV certificate the company must go through an extensive vetting process and all details about the company must be verified. Not every CA is allowed to offer EV certificates; this is restricted to CAs who have passed an independent qualified audit review. While EV certificate may seem similar to the organization validation certificate, the key difference is the level of validation that is required. The EV certificate itself also includes extra identification information that further allows the certificate to be identified on an organiza-tional level during the certificate validation process.

The guidelines for EV certificates are managed by the CA/Browser Forum. A successfully verified EV certificate usually, in addition to the “padlock”, includes the name of the company and the issuing CA in the colored display in the address bar. The left bar in Figure 2.12 shows how Google’s Chrome browser indicates EV certificate.

(26)

3 Method

The objective of this thesis is to observe how the HTTPS protocol is used in practice . For this purpose we used data collected passively from a network. In this section the method in which the data was collected, processed, and analyzed is described.

3.1 Data Collection

The data used in this thesis was exclusively collected from the University of Calgary, Canada, through passively monitoring the traffic between the campus network and the Internet. Pas-sive monitoring is a technique used to collect data from a monitored network by copying the traffic via a network tap. The network tap is an external hardware device that is inserted at a specific point in a network to mirror the traffic that passes through it, in this case the uni-versity’s multi-Gbps ingress/egress link. The traffic in this kind of network includes a group of more than 30,000 users (students, staff and professors). This provides good examples of how typical HTTPS communication generally looks like, both in terms websites accessed and devices used (e.g., smartphones, tablets, desktops, servers).

Privacy Concerns: We only gathered statistical data for the purpose of analyzing properties of the TLS/SSL communication. We do not conduct any analyses to identify the activities of individual users on the campus. The rules regarding the distribution of data collected at the university of Calgary are very strict and does not allow recorded data containing IP addresses of users to leave the university. All the data processing, as described in the next Section 3.2, was done on the University of Calgary’s servers and only the final aggregated log-files, containing exclusively statistical data, left the campus. Furthermore, any actionable information regarding security on the campus network was shared with the campus IT staff.

3.2 Data Processing

The network traffic was processed using the network security monitor Zeek version 2.4.1 (called Bro at the time of analysis) [31]. Zeek provides a comprehensive analysis framework for both general network traffic analysis and more specialized analysis of the TLS/SSL

(27)

com-3.2. Data Processing

munication. Using the Zeek framework allowed us to log specific information about the non-encrypted part of the TLS/SSL handshake and all digital certificates sent.

3.2.1 Zeek Scripts

The scripting language in Zeek uses an event-driven approach where writing scripts involves handling the events generated by Zeek as it processes network traffic. All events are placed into an ordered "event queue", allowing scripted event handlers to process the events on a first-come-first-serve basis. Typical events generated are for example when a new HTTP or HTTPS connection is initiated or closed. We use these events for creating new data storage objects and writing the data to a log file respectively.

We developed several Zeek scripts for the purpose of recording data from HTTPS commu-nication. Each script produces log files, stored and compressed in intervals. The three main scripts are as follows: a script to summarize statistics of TLS/SSL communications, a script to record statistics of certificate usage, and script to record the ratio between HTTP/HTTPS traffic.

TLS/SSL Communications Script: A script that records a summary about every HTTPS session initiated and established. The information is stored in a log file with one entry per session. The summary record for each session includes:

Client Protocol Version:The highest TLS/SSL Protocol that the Client supports.

Protocol Version Used: The TLS/SSL Protocol that the Server chooses to use for the session.

Client list of Cipher Suites:The preference ordered list of cipher suites that the Client supports.

Cipher Suite Used:The cipher suite that the Server chooses to use for the session.

Certificate Chain Length:The Length of the Certificate Chain.

User-Agent String:The user-agent string of the client used to distinguish between mo-bile/stationary clients and their browser versions. See information about how this is derived in section 3.2.2.

Validation Status:Status of Session Validation.

Heartbleed Status:Status of possible Heartbleed attacks during this session.

Certificate Statistics Script: A script that records a summary of from every individual cer-tificate sent to the browser during the authentication stage of the TLS handshake. The infor-mation is stored in log file with one entry per certificate. For each certificate the summary includes:

Signature Algorithm:The Algorithm used to sign the certificate.

Public Key Algorithm:The algorithm used for the public key.

Period Validity:The period validity status and validity period duration.

Extended Validation:If the certificate is an EV certificate.

Basic Constraints: If the certificate utilizes any basic constraints like for example the path length constraint.

Subjects:The subject or subjects associated with the certificate.

(28)

3.3. Analyzing the Data

Figure 3.1: Log file processing.

Ratio between HTTP and HTTPS Traffic Script: A script that periodically summarizes the share of HTTPS connections compared to HTTP connections. Information is stored in one hour batches.

3.2.2 Identifying Mobile and Stationary Devices

Identifying mobile and stationary user devices based only on passive HTTPS information is non-trivial since the necessary information, the user-agent string, is sent in the encrypted part of the session establishment. In this thesis we leverage the fact that in typical web sessions a client, even when only visiting a single website, issue requests to many different servers; some accessed with HTTP and some with HTTPS. For HTTPS session classification the script therefore temporarily (in a 5 minute rolling window, not written to disk) keeps track of IP-to-user-agent mappings of observed HTTP sessions and compare the IP address of all new HTTPS sessions with recent HTTP sessions. Matches are assumed to originate from the same client and HTTPS sessions are classified as either mobile or stationary based on the corre-sponding user-agent string.

3.2.3 Local University of Calgary Servers

To avoid analysis results skewed towards the behavior of servers local to the University of Calgary we decided to filter all such sessions based on IP prefix and focus on the traffic between local clients and remote servers (i.e., servers located outside the campus).

3.3 Analyzing the Data

Each Zeek script produces log files stored and compressed in intervals, thus producing many separate log files to encompass the whole time period. In order to analyze the data we were first required to process all the separate log files to compile one data file for each log-file type. This process is illustrated by Figure 3.1. After the aggregation process, analyzing scripts could be run on each data file individually as well as together for cross-data file analysis.

(29)

4 Results

In this chapter we describe the results from our analysis. Following a summary of our dataset, the chapter is divided into one section for each trust relationship, and ends with a section summarizing the observed overall HTTPS session quality. For each relationship we describe its relevance for the security of HTTPS, and present the results of our analysis.

An important distinction that we sometimes make is whether we analyzed the distribu-tion of unique certificates or observed certificates. The difference is that the former refers to a distribution of each individual certificate regardless of how many times it has been observed. The latter refers to the full distribution of observed certificates and the result is consequently weighted towards the set of certificates observed more frequently.

4.1 Summary of Dataset

For this thesis we gathered data during a one week period (Oct. 11-17, 2015). Table 4.1 sum-marizes our datasets, broken down on a per-session and per-certificate basis. In total, we observed 232,640,189 HTTPS sessions. Of these sessions, 157,225,583 (67.58%) contained cer-tificates, while the rest (32.42%) were session resumptions using the abbreviated TLS hand-shake to resume a previously established HTTPS session (see section 2.3.3).

Table 4.1: Dataset overview.

Sessions Observed Share Observed Mobile Observed Stationary With Certificates 157,225,583 (67.58%) 32,877,565 (70.08%) 74,432,271 (67.94%) Resumption 75,414,606 (32.42%) 14,036,068 (29.92%) 35,117,577 (32.06%)

Total 232,640,189 46,913,633 109,549,848

Certificates Unique Observed

Leaf 66,912 (98.89%) 319,612,494 (57.86%)

Authorities 750 (1.11%) 232,774,694 (42.14%)

Total 67,664 552,387,188

We further managed to identify 46,913,633 sessions from clients using mobile devices and 109,549,848 sessions from stationary. These individual subsets showed a similar ratio between sessions with certificates and resumptions. For the mobile sessions there were 32,877,565

(30)

4.2. Ratio between HTTP and HTTPS

Figure 4.1: Number of established sessions plotted over time. Shows total number of sessions as well as the subsets of sessions using HTTP and HTTPS.

(70.08%) and for the stationary sessions there were 74,432,271 (67.94%) that contained certifi-cates.

In total, across all sessions, we observed 67,664 unique certificates. Together these cer-tificates were observed a total of 552,387,188 times, with the majority of sessions sending multiple certificates in the respective certificate chains. Of these, 750 (1.11%) where author-ity certificates, while the remaining 66,912 (98.89%) where leaf certificates. In contrast, the smaller set of authority certificates were observed in total 232,774,694 (42.14%) times while the much larger share of leaf certificates were observed 319,612,494 (57.83%) times. The skew in shares between unique and total observed certificates is due to the fact that many leaf certificates are signed by the same authority certificate.

4.2 Ratio between HTTP and HTTPS

While this thesis is primarily focused on HTTPS, an interesting aspect to look at is the ratio of sessions using HTTPS compared to HTTP. Inspecting the established connections also allows us to confirm whether the collected data seems plausible and thereby conclude that the data gathering process was successful. For this reason we recorded each established session with a timestamp. The Figure 4.1 shows the number of established sessions as well as the subsets of sessions using HTTP and HTTPS plotted over time.

In our dataset, Oct 11 and Oct 17 were Sundays and Oct 12 a statutory holiday. With the exception of Oct 15, our data shows one spike for each day. For Oct 11, 12, 15 and 17 the ratio of HTTP and HTTPS sessions are very similar. For Oct 13, 14, and 16 the ratio of HTTPS sessions are higher than the ratio HTTP sessions.

(31)

4.3. Trust in Browsers

Table 4.2: Browser share.

Name Observed Share

Chrome 178,042,643 (51.48%) Safari 77,990,330 (22.55%) Firefox 65,638,870 (18.98%) Internet Explorer 23,255,519 (6.72%) Opera 890,904 (0.26%) SeaMonkey 29,251 (0.01%) Chromium 8,772 (0.00%)

4.3 Trust in Browsers

The browser plays a key role in the HTTPS landscape. It is perhaps the most explicit choice of trust for a user. The browser is responsible for managing the implementation of HTTPS as well as the selection of the CAs that are currently considered trusted. When a new security vulnerability is found it is important that a good browser immediately patches the implemen-tation to protect against it being exploited. For this reason it is important that browsers are kept up-to-date with the latest versions. With this in mind we investigated how up-to-date the browsers in practice actually are. When considering the browser version we looked at the latest officially released stable version and regard a browser to be behind if it does not have the latest security update. We do not take into account Beta or developer versions. The data considered in this analysis is taken from both the observed HTTP sessions as well as the subset of HTTPS sessions where we could identify the user agent string as described in Section 3.2.2.

Browser Distribution: Table 4.2 shows the distribution of different browsers observed in our dataset. The result is not surprisingly very skewed towards a small set of very popular browsers with the common factor of being developed and supported by large corporations. The most popular browser is Google’s Chrome, which is observed in 178,042,643 total ses-sions. This makes up 51.48% of all observed sesses-sions. Chrome is followed by Apple’s Sa-fari, observed in 77,990,330 (22.55%) sessions, and Mozilla’s Firefox, observed in 65,638,870 (18.98%) sessions. The fourth most used browser is Microsoft’s Internet Explorer, observed in 23,255,519 (6.72%) sessions. We further observed the usage of three minor browsers, Opera, observed in 890,904 (0.26%) sessions, SeaMonkey , observed in 29,251 (0.01%) sessions, and Chromium, observed in 8,772 (0.00%) sessions.

Google Chrome: Table 4.3 shows the distribution of the top ten most observed Chrome browser versions sorted by number of updates behind. In October 2015, the latest stable ver-sion of Chrome was Chrome/46.0.2590. This was observed in 39,889,597 (22.40%) of sesver-sions and is the second largest share. The most used version was Chrome/45.0.2454 observed in 115,154,160 (64.68%) sessions. This represents a browser that was one update behind the latest version. The third largest share, Chrome/42.0.2311, is observed in 7,414,204 (4.16%) sessions and is four updates behind. Two updates, Chrome/44.0.2403, and three updates, Chrome/42.0.2311, behind we observe in 3,240,668 (1.82%) and 7,414,204 (4.16%) sessions respectively. The oldest version seen in the dataset was Chrome/0.2.149, observed in 680 (0.00038%) sessions.

Apple Safari: Table 4.4 shows the distribution of the top ten most observed Safari browser versions sorted by number of updates behind. The latest version of Safari in October 2015 was Safari/9.0.1, which is observed in 486,346 (0.62%) sessions and is not located in the top ten. The most used version is Safari/9.0. This was one update behind and was observed in 32,378,275 (41.52%) sessions. The second largest share, Safari/8.0, is ten updates behind

(32)

4.3. Trust in Browsers

Table 4.3: Chrome version distribution.

Name Observed Share Behind

Chrome/46.0.2490 39,889,597 (22.40%) 0 Chrome/45.0.2454 115,154,160 (64.68%) 1 Chrome/44.0.2403 3,240,668 (1.82%) 2 Chrome/43.0.2357 1,951,253 (1.10%) 3 Chrome/42.0.2311 7,414,204 (4.16%) 4 Chrome/41.0.2272 859,049 (0.48%) 5 Chrome/39.0.2171 913,737 (0.51%) 7 Chrome/38.0.2125 1,662,113 (0.93%) 8 Chrome/34.0.1847 1,326,076 (0.74%) 12 Chrome/31.0.1650 1,672,425 (0.94%) 15

Table 4.4: Safari version distribution.

Name Observed Share Behind

Safari/9.0 32,378,275 (41.52%) 1 Safari/8.0.8 8,080,378 (10.36%) 2 Safari/8.0.7 3,954,857 (5.07%) 3 Safari/8.0.6 1,518,559 (1.95%) 4 Safari/8.0.5 1,993,624 (2.56%) 5 Safari/8.0.3 1,488,709 (1.91%) 7 Safari/8.0 8,786,587 (11.27%) 10 Safari/7.1.8 1,698,240 (2.18%) 11 Safari/4.0 3,735,701 (4.79%) 17 Safari/7.0 1,941,555 (2.49%) 26 Table 4.5: Firefox version distribution.

Name Observed Share Updates Behind

Firefox/41.0 43,911,471 (66.90%) 2 Firefox/40.0 11,983,891 (18.26%) 5 Firefox/39.0 912,329 (1.39%) 7 Firefox/38.0 2,136,386 (3.25%) 26 Firefox/37.0 290,093 (0.44%) 29 Firefox/36.0 403,193 (0.61%) 34 Firefox/34.0 2,453,535 (3.74%) 38 Firefox/33.0 266,921 (0.41%) 44 Firefox/22.0 338,179 (0.52%) 87 Firefox/12.0 395,862 (0.60%) 121

Table 4.6: Internet Explorer version distribution. Name Observed Share Updates Behind

MSIE 11.0 511,508 (2.20%) 0 MSIE 10.0 10,645,456 (45.76%) 1 MSIE 9.0 3,296,946 (14.17%) 2 MSIE 8.0 1,907,360 (8.20%) 3 MSIE 7.0 5,063,839 (21.77%) 4 MSIE 6.0 1,682,342 (7.23%) 8 MSIE 5.5 6,625 (0.03%) 9 MSIE 5.0 141,241 (0.61%) 11 MSIE 4.0 5,279 (0.02%) 13

and is observed in 8,786,587 (11.27%) sessions. Two updates, Safari/8.0.8, and three updates, Safari/8.0.7, behind we observe in 8,080,378 (10.36%) and 3,954,857 (5.07%) sessions respec-tively. The oldest version seen in the dataset was Safari/1.0, observed in 35 (0.000045%) sessions.

Mozilla Firefox: Table 4.5 shows the distribution of the top ten most observed Firefox browser versions sorted by number of updates behind. In October 2015 the current release of Firefox was Firefox/41.0.2. This version was rarely observed (only 680 sessions), and was not located in the top ten. The most used version was Firefox/41.0 that was two updates behind and was observed in 43,911,471 (66.90%) sessions. The second most used version was Firefox/40.0, which is five updates behind and was observed in 11,983,891 (18.26%) sessions. Following the two largest shares there were even older versions that were seven and more updates behind. The oldest version seen in the dataset was Firefox/0.8, observed in 1604 (0.0024%) sessions.

A difference between Firefox and the other browsers is the frequency in which security updates are released. Firefox releases a considerable larger number of security updates com-pared to the others. This is reflected in the overall higher number of updates behind the version share are for the Firefox browser.

Microsoft Internet Explorer: Table 4.6 shows the distribution of the top ten most observed Internet Explorer browser versions sorted by number of updates behind. The latest version of Internet Explorer available in October 2015 was MSIE 11.0. This was observed in 511,508 (2.20%) sessions and had the sixth largest share. The most used version was MSIE 10.0, ob-served in 10.645.456 (45.76%) sessions and was one version behind. The second most used version was MSIE 7.0, observed in 5.063.839 (21.77%) sessions, which was four versions be-hind. Two updates, MSIE 9.0, and three updates, MSIE 8.0, behind we observed in 3.296.946

Characterizing the HTTPS Trust Landscape : - A Passive View from the Edge

Linköping University | Department of Computer and Information Science

Master thesis, 30 ECTS | Datateknik

2019 | LIU-IDA/LITH-EX-A--19/079--SE

Characterizing the

HTTPS Trust Landscape

A Passive View from the Edge

Karaktärisering av HTTPS Förtroende-Landskap

Gustaf Ouvrier

Upphovsrätt

Copyright

Contents

List of Figures

List of Tables

1

Introduction

1.1

Motivation

1.2

Aim

1.3

Research Questions

1.4

Delimitations

1.5

Contributions

2

Theory

2.1

Security Aspects

2.1.1

Confidentiality

2.1.2

Data Integrity

2.1.3

Authentication

2.2

Cryptographic Primitives

2.2.1

Cryptographic Hash Functions

2.2.2

Symmetric-Key Cryptography

2.2.3

Message Authentication Code

2.2.4

Asymmetric-Key Cryptography

2.2.5

Key Exchange

2.3

Transport Layer Security

2.3.1

History

2.3.2

Record Protocol

2.3.3

Handshake Protocol

2.3.4

Change Cipher Spec Protocol

2.3.5

Application Protocol

2.3.6

Alert Protocol

2.4

Public Key Infrastructure

2.4.1

Certificate Authorities

2.4.2

X.509 Certificate

2.5

Validating Certificate

2.5.1

Building the Certificate Chain

2.5.2

Verifying the certificates

2.6

Certificate Issuance

2.6.1

Domain Validation

2.6.2

Organization Validation