• No results found

3 Security in Cloud Infrastructure

3.2 Cloud Storage Protection

Storing data backups in remote infrastructure has a decades-long history [117]; early works primarily addressed reliability through disaster recovery [118–120], with little regard to the security of the remotely stored data. However, the dynamicity and multi-tenancy of cloud infrastructure, along with new uses of storage for internal cloud man-agement operations have introduced new attack vectors.

Whether used for storing internal cloud infrastructure management data or allocated to VM instances, block storage is often represented by logical volumes assembled from multiple disk partitions that may be physically scattered throughout the deployment and replicated within or across datacenters. Beyond the obvious challenges of pinpoint-ing the physical location of data at any moment in time, peer tenants, rogue system administrators and state-level adversaries may attempt to break the integrity and con-fidentiality of data stored on block storage allocated to VM instances.

Results from industry security research reported in 2011 a vulnerability where block storage allocated to a VM instance was later re-allocated to new VM instances while still containing old data [21]. Disk wiping [121] - or in case of encrypted volumes simply by reliable destruction of the encryption key - can effectively resolve such issues. Below follows a brief account of notable research efforts towards protection of data in remote infrastructure.

Kamara and Lauter describe a cryptographic cloud storage architecture in [122]. The architecture provides confidentiality and integrity of data stored on remote infrastruc-ture, while providing availability, reliablity, efficient retrieval and data sharing. Its core workflow is as follows: customer data is transferred to the cloud storage through a data processor; once deployed, its integrity can be verified at any moment using a data verifier component; search and retrieval of customer data segments is done using search tokens issued by a token generator; finally, third parties can access and query the customer data using credentials issued by a credential generator according to a user-defined access control policy. The proposed architecture is enabled by several advances in cryptography supporting the requirements for cloud storage - namely searchable en-cryption [123, 124], attribute-based enen-cryption [125] and proofs of storage [126].

With CloudProof, Popa et al. address the lack of security aspects in the SLAs of cloud service providers. The proposed system detects and proves instances of security

prop-erty violation under the threat model of an untrusted cloud provider. The addressed security properties are: confidentiality, achieved by client-side content encryption prior to deployment in the cloud; integrity, achieved through a combination of client-owned private signature keys used to sign updated data blocks and public verification keys used to verify integrity of data blocks at read time; write-serializability (i.e. reliable versioning of stored data), achieved using attestation chains, i.e. chains of hashes over the data block version number and content computed at each data block update; fresh-ness (i.e. guarantee to provide the latest data block version), verified by checking the correctness of the attestation chain for a selected data block.

Approaches such as FADE [127] implement policy-based file assured deletion, in or-der to effectively prevent access to the stored files by the cloud service provior-der upon revocations of file access policies.

Excalibur combines policy-sealed data with Ciphertext Policy Attribute-Based Encryp-tion [128] in order to protect data in cloud infrastructure [129]. Policy-sealed data is a trusted computing abstraction designed for cloud services that leverages TPM function-ality. It allows data to be sealed (i.e., encrypted to a customer-defined policy) and then unsealed (i.e., decrypted) only by hosts whose configuration matches a given policy.

Mylar is a security framework that combines data protection in the cloud with end-to-end protection of client data access [130]. Mylar consists of four main components:

(i) a server-side library implements keyword search over encrypted data on the server;

(ii) a client-side library intercepts the data transfer with the server and manages data encryption and decryption, as well as client key management. (iii) a browser extension verifies the integrity of the client-side web application code; (iv) an optional identity provider verifies the link between user names and keys. This approach can benefit from both the continuous advances in searchable encryption for keyword search [123, 124]

and from advances in hardware-supported isolated execution environments [50] that can help reduce the reliance on the identity provider component.

The contributions described above presented a variety of approaches to storing, search-ing through, and sharsearch-ing confidentiality and integrity protected data in cloud deploy-ments. In Paper B we present a complementary approach based on combining several previous results [131–133], that implements a comprehensive protection mechanism in-cluding virtualization host attestation prior to virtual machine instantiation (verifiable trusted launch ) and user-controlled storage protection transparent for the VM instances (domain-based storage protection). Verifiable trusted launch, initially introduced in Pa-per A, provided a means for users to verify that the VM instance they communicate with has been launched following the trusted launch protocol on a platform with an attested TCB (see Section 3.2). Domain-based storage protection, initially described in [132] and the related patent [133] allowed encryption of persistent VM block storage by the hypervisor, transparent to the guest VM and independent of the implementation of the encryption libraries in the VMs.

By shifting disk encryption to the underlying hypervisor where it is managed by a dedicated secure component, the approach described in Paper B reduces the attack surface by maintaining all key material in the secure component rather than in the

virtual machine instances. In the same time, encrypting locally on the virtualization host the virtual disks mounted to VM instances allows to reduce the cost of cloud storage by storing data in other deployments with more relaxed security guarantees. Finally, maintaining control of the data encryption keys externally from the virtualization host allows tenants to seamlessly swap cloud services without the hurdle of secure data migration between infrastructure deployments.

Beyond using TPM functionality for key sealing, the approach described in Paper B re-lies on a secure component - a verifiable execution module performing confidentiality and integrity protection operations on VM instance data and key management. The use of a secure component in this work was inspired by approaches such as SecVisor [112] and CloudVisor [107], which relied on a verified software module executing at the highest privilege level. The solutions in [112] and [107] required placing the verified software module at the highest privilege level in order to protect it from a potentially malicious host operating system. In a novel implementation, this approach can leverage the in-creasing hardware support for TEEs on commodity platforms [115,134] by deploying the secure component in one such isolated execution environment. The solution is generic enough to be deployed with a variety of available or upcoming TEE implementations.

We used the block storage (Cinder) to implement domain-based storage protection in both Paper B and in [132].

In Paper C we extended the solution with access control for multi-tenancy support. In particular, we introduce extensions that allow tenants to control, at instance launch time, its read and write access rights over a storage device.

Data geolocation

Concerns about the physical location of data and its availability in different jurisdic-tions [135] gained further importance as state regulajurisdic-tions on data placement caught up with technological developments. Such regulations are referred to as data localiz-ation [136, 137], defined as “a policy whereby nlocaliz-ational governments compel Internet content hosts to store data about Internet users in their country on servers located within the jurisdiction of that national government” [137]. Several countries have ad-opted data localization regulations regarding storing, processing, or handling of certain types of data (the specific types of data differ depending on the country). Examples of such regulations include: Personally Controlled Electronic Health Record Provision in Australia [138], Cybersecurity law in China [139]; Telecommunications Act in Ger-many [140]; National Data Sharing and Accessibility Policy in India [141]; Information and Electronic Transaction Law in Indonesia [142]; Federal Law No. 242-FZ in Rus-sia [143]; and Defense Federal Acquisition Regulation Supplement: Network Penetration Reporting and Contracting for Cloud Services in the United States of America [144]

(see [136] for more details).

While data localization is a fairly recent term, a large body of research investigated the closely related aspect of data location, which remains an unsolved issue in the context of cloud computing. This is partly caused by the architecture of cloud computing

de-ployments, which rely extensively data replication and load balancing to ensure elastic scalability [64] and high availability at all times. Watson et al [145] showed that there are limits to the accuracy of verifying the location of data in a cloud storage. The authors demonstrated that when a malicious cloud service provider colludes with ma-licious hosts, it is unfeasible for a user to correctly verify the exact location of files.

Furthermore, Watson et al. were the first to take into consideration cases where two or more malicious hosts collude and make copies of the stored files. This assumption led them to posit that the task of restricting the geographic location of data is impossible.

The authors have suggested a proof of location scheme that can be used by a user to obtain the location of a stored file. However, the proposed scheme required signi-ficant supplementary infrastructure, as the solution relied on the existence of trusted landmarks responsible for verifying the existence of files on a host.

Similar to other solutions that rely on distance-bounding protocols [146–148], and latency-based techniques [149, 150], most data geolocation approaches not not address the question of limiting data accessibility according to geographic location.

NIST described a proof of concept for geolocation of data in the cloud [151], relying on the combination of trusted computing, Intel TXT and a set of manual audit steps to verify and enforce data location policies. The use of hardware-based isolation en-vironments has created the capability to restrict accessibility of cleartext data across jurisdictions (also refereed to as “geo-fencing”) [152, 153] however still relying on the cloud storage provider to enforce the data location policies.

In Paper D, we describe an approach to control access to cleartext data in data centers, based on their geographic location. The approach is based on sealing cryptographic material used to protect data integrity and confidentiality to approved geolocation cells described by a set of geolocation coordinates. We used a hardware RoT to unseal the cryptographic material only if the geographic location of the platform is within one of the approved geolocation cells. As a result, data can only be accessed in plaintext if the storage is placed in one of the geolocation cells approved by the data data administrator.

In all other cases, the data remains encrypted but can be replicated for redundancy purposes. While in the design and implementation phases we relied on gelocation data reported by the Global Positioning System (GPS), the solution can be implemented with other geolocation systems. The prototype was implemented using the Swift object store [154] (part of the OpenStack project); however data stored in other types of storage (such as block storage) could as well be geo-fenced based on the same principles.

While the approaches presented above provided multiple solutions to the data geo-location problem, this aspect remains unresolved in practical deployments, as users must trust the statements of service providers regarding data location.