092014
Performance of Disk I/O operations during the Live Migration of a
Virtual Machine over WAN
Revanth Vemulapalli Ravi Kumar Mada
Department of Communication Systems Blekinge Institute of Technology
SE-371 79 Karlskrona
Sweden
Engineering. The thesis is equivalent to 40 weeks of full time studies.
Contact Information:
Author(s):
Revanth Vemulapalli, Ravi Kumar Mada.
E-mail:
revanth.vemulapalli@gmail.com, m.ravikumar1989@yahoo.com.
University advisor(s):
Dr. Dragos Ilie,
Department of Communication Systems.
University examiner(s):
Prof. Kurt Tutschku,
Department of Communication Systems.
School of Computing
Blekinge Institute of Technology Internet : www.bth.se/com
SE-371 79 Karlskrona Phone : +46 455 38 50 00
Sweden Fax : +46 455 38 50 57
Virtualization is a technique that allows several virtual machines (VMs) to run on a single physical machine (PM) by adding a virtualization layer above the physical host’s hardware. Many virtualization products allow a VM be migrated from one PM to other PM without interrupting the services running on the VM. This is called live migration and offers many potential advantages like server consolidation, reduced energy consumption, disaster recovery, reliability, and efficient workflows such as “Follow-the-Sun”. At present, the advantages of VM live migration are limited to Local Area Networks (LANs) as migrations over Wide Area Networks (WAN) offer lower performance due to IP address changes in the migrating VMs and also due to large network latency.
For scenarios which require migrations, shared storage solutions like iSCSI (block storage) and NFS (file storage) are used to store the VM’s disk to avoid the high latencies associated with disk state migration when private storage is used. When using iSCSI or NFS, all the disk I/O op- erations generated by the VM are encapsulated and carried to the shared storage over the IP network. The underlying latency in WAN will effect the performance of application requesting the disk I/O from the VM.
In this thesis our objective was to determine the performance of shared and private storage when VMs are live migrated in networks with high la- tency, with WANs as the typical case. To achieve this objective, we used Iometer, a disk benchmarking tool, to investigate the I/O performance of iSCSI and NFS when used as shared storage for live migrating Xen VMs over emulated WANs. In addition, we have configured the Distributed Repli- cated Block Device (DRBD) system to provide private storage for our VMs through incremental disk replication. Then, we have studied the I/O per- formance of the private storage solution in the context of live disk migration and compared it to the performance of shared storage based on iSCSI and NFS. The results from our testbed indicate that the DRBD-based solution should be preferred over the considered shared storage solutions because DRBD consumed less network bandwidth and has a lower maximum I/O response time.
Keywords: Virtualization, XEN, DRBD, iSCSI, NFS, Iometer, Distributed
Replicated Block Device, file storage, block storage, virtual machine, VM
and I/O performance.
I would like to express gratitude to my supervisor Dr. Dragos Ilie who intro- duced the concept of virtualization to us and for his encouragement and support through valuable suggestions when required. Furthermore, I would like to thank Prof. Dr. Kurt Tutschku for his motivation and useful suggestions through the learning process of this master thesis. I would like to thank Dr. Patrik Arlos for lending us hardware for our experiments.
I am fortunate to have loving and caring parents who supported my stay and education against all odds. Without their support I couldn‘t have studied in Sweden. In my daily work I have been blessed with a friendly and cheerful group of fellow students (Chaitu, Gautam, Venu, Tushal, etc). I will forever be grateful for all your love and help.
–Revanth Vemulapalli
iii
I would like to express my gratefulness towards my supervisor Dr. Dragos Ilie, without whom this research would not have been possible. I’m grateful to Prof.
Kurt Tutschku for his consistent support and suggestions.
I would like to thank Dr. Patrik Arlos for providing required equipment and environment. I appreciate the cooperation of my partner Revanth.V for being consistent towards this research work.
I would like to thank my friends who have been great support. Last but not the least; I would also like to thank my family for their endless support.
–Ravi Kumar M
v
Abstract i
Acknowledgments iii
Acknowledgments v
List of Contents vii
List of Figures xi
List of Equations xiii
List of Tables xv
1 INTRODUCTION 1
1.1 Advantages of virtualization . . . . 2
1.2 Components of Virtualization . . . . 2
1.3 Types of Virtualization . . . . 3
1.3.1 Full Virtualization . . . . 3
1.3.2 Para Virtualization . . . . 3
1.3.3 Hardware Assisted Virtualization . . . . 4
1.4 Virtual Machine Migration . . . . 4
1.5 Problem Statement . . . . 6
1.6 Advantages of Live Migration over WAN . . . . 7
2 Aim, Objectives, Research Questions & Related Work 9 2.1 Aim and Objectives . . . . 9
2.2 Research Questions . . . . 10
2.3 Related work . . . . 10
3 XEN Hypervisor and Storage Solutions 17 3.1 Xen Hypervisor . . . . 17
3.2 Distributed Replicated Block Device (DRBD) . . . . 19
3.2.1 DRBD disk replication algorithm . . . . 21
3.2.2 DRBD Replication Modes . . . . 22
vii
3.2.5 Efficient Synchronization . . . . 24
3.2.6 XEN with DRBD . . . . 25
3.3 iSCSI Block Storage . . . . 27
3.3.1 iSCSI Architecture . . . . 27
3.4 Network File Storage (NFS) . . . . 29
4 Research Methodology and Test-bed 31 4.1 Research Methodology . . . . 31
4.1.1 Stage 1 . . . . 31
4.1.2 Stage 2 . . . . 32
4.1.3 Stage 3 . . . . 32
4.2 Experiment Test-beds . . . . 33
4.3 Hardware configuration . . . . 33
4.4 Benchmarking and other tools . . . . 34
4.4.1 Iometer . . . . 34
4.4.2 NetEm . . . . 35
4.4.3 Wireshark . . . . 35
4.5 Experiment Scenarios . . . . 35
4.5.1 Scenario 1 . . . . 35
4.5.2 Scenario 2 . . . . 36
4.5.3 Scenario 3 . . . . 37
4.6 Performance Metrics . . . . 37
4.6.1 IOps performance . . . . 37
4.6.2 I/O Throughput . . . . 38
4.6.3 Maximum I/O Response Time . . . . 38
4.6.4 Network Load . . . . 38
4.7 Delay Characteristics of WAN . . . . 38
5 Experiment Results 40 5.1 Input Output per Second Performance . . . . 41
5.1.1 No delay (0.02 ms) . . . . 41
5.1.2 Inter-Europe (30ms Delay) . . . . 42
5.1.3 Trans-Atlantic (90ms Delay) . . . . 42
5.1.4 Trans-Pacific (160ms Delay) . . . . 43
5.2 I/O Throughput during Migration . . . . 44
5.2.1 Campus Network (0.02 ms) . . . . 45
5.2.2 Inter-Europe (30ms Delay) . . . . 45
5.2.3 Trans-Atlantic (90ms Delay) . . . . 46
5.2.4 Trans-Pacific (160ms Delay) . . . . 46
5.3 Maximum Response Time (MRT) . . . . 48
5.3.1 No delay (0.02 ms) . . . . 48
viii
5.3.4 Trans-Pacific (160ms Delay) . . . . 50 5.4 Network Load . . . . 51
6 Conclusion and Future work 53
6.1 Conclusion . . . . 53 6.2 Future Work . . . . 54
Appendix 56
ix
1.1 Virtualization . . . . 1
1.2 Type-1 and Type-2 Hypervisor . . . . 3
1.3 Types of Virtualization . . . . 4
1.4 Xen Live Migration [9] . . . . 5
2.1 Three-Phase whole system live migration . . . . 11
2.2 Incremental Migration Algorithm . . . . 12
3.1 Xen Architecture . . . . 18
3.2 DRBD Position in I/O stack . . . . 20
3.3 Block diagram of DRBD algorithm . . . . 21
3.4 Asynchronous protocol . . . . 22
3.5 Memory or semi Synchronous protocol . . . . 23
3.6 Synchronization protocol . . . . 23
3.7 DRBD Three Way Replication . . . . 24
3.8 DRBD disk status during vm live migration. . . . 26
3.9 Communication between initiator and target . . . . 28
3.10 iSCSI Block-Access Protocol . . . . 29
3.11 NFS File-access protocol . . . . 30
4.1 Experiment setup . . . . 32
4.2 Scenario 1 and 2 . . . . 36
4.3 Scenario 3 . . . . 36
5.1 Graph: IOps Vs Network Delay . . . . 44
5.2 Graph: MBps Vs Network Delay . . . . 47
5.3 Graph: MRT vs Network delay . . . . 50
5.4 Graph: Network Bandwidth vs Storage Solutions . . . . 52
xi
3.1 Synchronization time . . . . 25 5.1 Confidence Interval . . . . 40 5.2 Relative error . . . . 41
xiii
3.1 Comparison of storage solutions . . . . 27
5.1 IOps performance of different solutions on campus network . . . . 42
5.2 IOps performance of different solutions on Inter-Europe Network . 42 5.3 IOps performance of different solutions on Trans-Atlantic Network 43 5.4 IOps performance of different solutions on Trans-Pacific Network . 43 5.5 I/O throughput for different solutions on Campus Network . . . . 45
5.6 I/O throughput for different solutions on Inter-Europe Network . 45 5.7 I/O throughput for different solutions on Trand-Atlantic Network 46 5.8 I/O throughput for different solutions on Trans-Pacific network. . 47
5.9 MRT of different solutions on Campus network . . . . 48
5.10 MRT of different solutions on inter-Europe network . . . . 49
5.11 MRT of different solutions on trans-Atlantic network . . . . 49
5.12 MRT of different solutions on trans-Pacific network . . . . 50
1 NFS address Details . . . . 56
2 iSCSI address Details . . . . 58
xv
ABSS Activity Bases Sector Synchronization MBPS Mega Bytes per Second
AVG average page dirty rate MIPv6 Mobile Internet Protocol version 6 CBR Content Based Redundancy MRT Maximum Response Time
CHAP
Challenge-Handshake Authentication
Protocol NAS Network Attached Storage
CPU Central Processing Unit NFS Network File Storage DBT Dirty Block Tracking OS Operating System
DDNS Dynamic Domain Name Service PBA Proxy Binding Acknowledgement DNS Domain Name Service PBU Proxy Binding Unit
DRBD Distributed Replicated Block Device PMIPv6
Proxy Mobile Internet Protocol version 6
HIST history based page dirty rate RDMA Remote Direct Memory Access IM Incremental Migration RPC Remote Procedure Call
IO Input Output RPD Rapid Page Dirtying
IOPS Input Outputs per Second SAN Storage Area Network IPSec Internet Protocol Security SFHA Super-Fast Hash Algorithm IPv4 Internet Protocol version 4 TCP Transmission Control Protocol IPv6 Internet Protocol version 6 TOE TCP offload engine
iSCSI
internet Small Computer System
Interface VM Virtual Machine
KVM Kernel-based Virtual Machine VMM Virtual Machine Monitor
LAN Local Area Network VMTC Virtual Machine Traffic Controller LMA Local Mobility Anchor VPN Virtual Private Network
MAG Mobile Access Gateway WAN Wide Area Network
INTRODUCTION
Hardware virtualization is one of the underlying techniques that makes cloud computing possible [7]. Virtualization is a technique that allows a physical host to run two or more operating systems simultaneously. This technique adds a virtualization layer between the physical host’s operating system and hardware, allowing virtualized hosts to utilize the physical host processor, memory and I/O devices [1]. The physical host may be a desktop that contains a limited number of virtual hosts or a Data Centre (DC) containing several virtual hosts. Virtual- ization has many advantages like server consolidation, providing virtual desktops, reducing energy consumption, hardware cost reduction, etc. The simple diagram of virtualization where a single physical machine hosting three virtual machine is shown in figure1.1.
Figure 1.1: Virtualization
This report is divided into six chapters. The concepts of virtualization and
1
advantages of VM migration over WAN are discussed in chapter one. Chapter two describes our aim, research question and related work. We introduce the XEN hypervisor, the disk replication scheme DRBD and shared storage approaches like iSCSi and NFS in chapter three. We describe our research methodology and ex- periment setups with detail description of hardware used in our experiment along with delay characteristics of WAN in chapter four. The analysis of the results observed during the experiment are presented in chapter five. We concluded our thesis and discussed possible future work in chapter 6.
1.1 Advantages of virtualization
Some of the advantages of virtualization are hardware abstraction and server consolidation where several small physical server machines can be efficiently re- placed by a single physical machine with many virtual server machines. Virtual- ization is cost efficient. It eliminates the need of several individual physical server machines thereby reducing the physical space, energy utilization etc. Below is a list of advantages [3] [51] associated with virtualization:
• Server consolidation and hardware abstraction,
• Proper resource utilization,
• Cost reduction,
• Low power consumption,
• Reduction of physical space,
• Global warming and corporate greenhouse gas reduction,
• Flexibility,
• High Availability, etc.
1.2 Components of Virtualization
Virtualization has two important components, namely the hypervisor or Vir- tual Machine Monitor (VMM) and the guest. The hypervisor is responsible for managing the virtualization layer. There are Type1 and Type 2 hypervisors.
Type-1 and type-2 hypervisors are shown in below figure 1.2.
Type-1 hypervisor, which is also known as bare metal hypervisor runs directly above the host’s hardware. They have direct access to the hardware resources.
They provide greater flexibility and better performance due to its design [9]. Xen, Microsoft Hyper-V and VMware ESX are the type-1 hypervisors.
Type-2 hypervisor works on top of host machine’s operating system as an
application. For every virtual machine there is one virtual machine monitor
to run and control them, for this reason there is one virtual platform for each
and every virtual machine to work. Virtual box, VMWare player and VMware workstation are examples of type-2 hypervisors.
The guest is the virtual host that runs above the virtualization layer. The virtual host has its own operating system (OS) and applications. This guest operating system can be migrated from one physical host to other physical host.
Figure 1.2: Type-1 and Type-2 Hypervisor
1.3 Types of Virtualization
Virtualization can be classified into three types: full virtualization, para- virtualization and hardware assisted virtualization [1]. These three virtualizations are shown in figure 1.3.
1.3.1 Full Virtualization
In full virtualization, binary translation is performed over the virtual ma- chine’s privileged instructions before sending them to the physical host’s CPU.
Full virtualization is the combination of binary translation and direct execution.
The virtual operating system believes that it owns the hardware itself. The virtual operating system’s privileged instructions are translated by the hypervisor before sending them to the CPU whereas user level instructions are directly executed.
1.3.2 Para Virtualization
Pare-virtualization was introduced by Xen projects team. It is a virtualization
technique in which the kernel of the guest operating system is modified to make it
aware of the hypervisor. In this technique the privileged instructions are replaced
by hypercalls that directly communicate with the hypervisor. This technique is efficient and easy to implement [20].
1.3.3 Hardware Assisted Virtualization
Hardware assisted virtualization is also called accelerated virtualization [4].
This technique allows users to use unmodified operating system by using the special features provided by computer hardware. Both Intel and AMD started supporting hardware virtualization (VT-X / AMD-V) since 2006. In this tech- nique, the virtual machine monitor runs on a root mode privilege level below the ring 0 [53]. That is, hardware virtualization creates an additional privilege level below ring-0 which contains hypervisor and leaves ring-0 for unmodified guest operating system [52]. The protective ring structure of hardware assisted virtu- alization is shown in figure 1.3. CPU’s manufactured before 2006 cannot make advantage of this system.
Figure 1.3: Types of Virtualization
1.4 Virtual Machine Migration
.The virtual machine migration is the process of moving a working/running virtual machine from one physical host to another physical host without interrupt- ing the services provided by it. During migration the memory, CPU, network and disk states are moved to the destination host. The end users using the services provided by the virtual machine should not detect the notable changes. There are three types of migrations: cold migration, hot migration and live migration.
In cold migration, the virtual machine is first powered off at the source node
before migrating it to the destination node. CPU state, memory state and existing
network connections in the guest OS are lost during cold migration. In hot
migration, the virtual machine is suspended at the source node before migrating it to the destination node where it is resumed. Most of the OS state can be preserved during this migration.
Figure 1.4: Xen Live Migration [9]
In live migration [9] the hosted operating system is migrated along with CPU, memory and disk state from source to destination while the hosted OS is still in running state without losing the active network connectivity. The disk state migration is not necessary in the case of using shared storages like Network At- tached Storage (NAS) or Storage Area Network (SAN). Among three migration techniques, live migration is best suited to reduce notable downtime of services running on the virtual machine [9]. By using live migration, load on physical machines hosting several virtual machines can be decreased to a large extent [22].
Live migration may increase the total migration time of the virtual machines running server applications.
In order to perform live migration, hypervisor needs to move the memory state of the virtual machine from the source to the destination physical machine.
The crucial method used for migrating memory state is known as pre-copy, and
is clearly explained in [8] and [9]. In pre-copy phase the memory state of vir- tual machine is copied to destination host in iterations. Unmodified or unused pages are moved in first round and modified pages are moved in next nth round.
Hypervisor maintains dirty bitmap to track modified pages. If pre-determined bandwidth is reached or when bandwidth range is below 256kb, pre-copy phase is terminated [4] and stop and copy phase is initiated. In stop and copy phase the virtual machine is paused on source host and modified pages are copied to destination host. The virtual host is resumed at the destination host and starts working [3] as usual. Xen live migration time line is shown in figure 1.4 [9].
Various disk state migration and network state migration schemes proposed by various researchers are discussed in related work section of chapter 2.
1.5 Problem Statement
Xen [8], Hyper-V and VMware are some of the virtualization suits that sup- port live migration over LAN. Some researchers are working on live migration of virtual machines over WAN. This is quite challenging because typically the WAN has limited bandwidth and high latency when compared to the LAN. Migration over WAN requires network reconfigurations because of the changes observed in the IP address pool
1at destination host after migration. If Virtual Private Net- work (VPN) or any other form of tunnels are not used, the source host and destination host resides at different sub-network. So, the VMs migrating over WAN should change their IP address to destination hosts subnet to have uninter- rupted network connectivity. For example a virtual machine hosting a webserver is running at site 1. Due to some issues like hardware maintenance or overload, that virtual machine is required to migrate to other location called site 2 which is at a different geographical location. After migration the virtual machine which is at site 2, where a different network prefix is used must acquire a new IP ad- dress. Additionally, it may also have to reconfigure its default gateway. This situation leads to loss of connection with clients using services of the webserver running on virtual machine. We conducted a literature study on different live migration schemes proposed by various researchers to support live migration of virtual machine over WAN without losing network connectivity. VM migration over WAN has many advantages like cloud bursting, load balancing, enabling “fol- low the sun” IT-strategy [11], consolidation, maintenance, scalability, reliability, recovery, disaster management, etc. [3].
Shared storage and private storage are two data storage techniques used to store the disk of the virtual machines. Private data storage, which is also called as direct-attached storage is the storage allotted to a particular host. It cannot share its unused space with other hosts, whereas shared storage is the space allocated
1A set of IP addresses available.
commonly to two or more hosts. If a virtual machine use private storage it requires the virtual disk to be migrated along with memory, CPU and network states. This increases the downtime and total migration time of the virtual machine. Frequent disk migrations also leads to high network overhead due to their size. Therefore shared storage is utilized to eliminate the need of disk state migration when a virtual machine is migrated. This will reduce the total migration and downtimes.
However, when migrated over WAN, the performance of virtual machine may still be effected due to the high latency in the network. This is because the virtual machine uses network to access disk which is located at another geographical location. To our best knowledge the disk I/O performance of virtual machine while migrating over WAN was not fully explored earlier. Our thesis work may look similar to [2], the difference between both the works is discussed in section 2.3.
The primary purpose of our thesis is to investigate disk performing, during the live migration of a virtual machine over the WAN using different shared storage techniques. We will analyse the performance of iSCSI, NFS and DRBD storage techniques and recommend the best technique for specific scenarios.
1.6 Advantages of Live Migration over WAN
Almost all the advantages of VM live migration are currently limited to LAN as Migrating over WAN effects the performance due to low latency and network changes. The main goal of our thesis is to analyse the performance of various disk solutions available during the live migration of VM over WAN. When a virtual machine using shared storage is live migrated to a different physical host, end users interacting with a server running on the migrating virtual machine should not sense notable changes in the performance of the server. Live migration is supported by various popular virtualization tools like VMWare and Xen. The advantages of live migration over WAN will motivate our thesis work in this area.
1. Maintenance: During the time of scheduled maintenance all the virtual machines running in the physical host are migrated to other physical host so that the maintenance work doesn’t create interruption to the services provided by the virtual machines.
2. Scaling and Cloud Bursting: Load balancing and consolidation can make best use of virtual machine migration over WAN. If the physical host gets overloaded beyond the capacity of hardware resources it will affect the performance of other virtual machines. So the virtual machines should be migrated (cloud busted) to physical hosts at other geographical locations to attain load balancing.
3. Power Consumption: Virtual machines running on low populated hosts
can be migrated to moderately loaded physical hosts at different location.
This allows the initial host to be shutdown to reduce unnecessary power wastage.
4. Disaster Recovery and reliability: During the times of disaster the virtual machine running on a physical host can be saved by migrating it to another physical host over WAN. When a physical host is corrupted or destroyed the virtual machine can be recreated or booted at other mirror location by using the VM’s shared disk and configuration file (in case of Xen) reducing the service downtimes.
5. Follow-the-Sun: It is a new IT strategy where a VM can be migrated
between different time zones in timely manner. This was designed for the
teams working for a project round the clock. Team A works on a project
during their working hours and the data is migrated to other location where
team B will take care of the work during their work hours and migrate data
to team A later.
Aim, Objectives, Research Questions &
Related Work
In this chapter we briefly described our aim and objectives along with research questions. We discussed the related work conducted by various researchers on network and disk state migrations.
2.1 Aim and Objectives
In this thesis we analyzed the performance of disk state during the live mi- gration of virtual machine over an emulated WAN. We had used Xen [8], an open source hypervisor developed at the Cambridge university computer science lab which supports both hardware assisted virtualization and Paravirtualization.
Xen also supports live migration of virtual machines. To overcome the limitations of disk migration over WAN, which we reported in chapter 1 we had used DRBD to replicate disk at both cloud locations through periodic updates. We identified the performance metrics relevant to our approach. We had also repeated the experiment with other shared storage solutions like NFS and iSCSi. Finally, we analyzed the performance of all the disk storage solutions and compared their performance.
The primary purpose of our thesis is to analyze the performance of virtual ma- chine’s disk during the live migration of virtual machine over WAN. We analyzed the performance of network and distributed storage solutions and recommend the best solution.
We laid out the following objectives to achieve the above aim.
• Conduct a literature review of the various virtual machine migration schemes over WAN.
• Conduct a literature review of various disc migration algorithms/methods proposed by various researchers.
• Conduct a study on Distributed Replicated Block Device (DRBD) tech- nique.
• Perform an experiment on a laboratory test bed.
9
• Identify the performance metrics to evaluate the above experiment.
• Repeat the experiment with different network storage systems like NFS and iSCSi.
• Analyze the results and draft the observations.
2.2 Research Questions
The following are the research questions we are going to answer in our thesis.
RQ1:
What are the strategies proposed by various researchers to support live virtual machine migration over WAN/Internet?
RQ2:
What are the various disc migration schemes proposed by various researchers?
RQ3:
How efficient is the Distributed Replicated Block Device (DRBD) to mirror disc data in different locations?
RQ4:
What is the I/O performance when migrating virtual machines, over a WAN using DRBD?
RQ5:
What is the I/O performance when migrating virtual machines with shared storage solutions over a WAN?
RQ6:
Among distributed and shared storages which solution performs better while migrating over WAN?
2.3 Related work
Luo et al. [12] discussed a live migration scheme which migrates the local disk, CPU and memory state of the Virtual node. They proposed a three-phase migration algorithm that minimizes the downtime caused by large disc migrations.
The three phase migration algorithm was shown in figure 2.1. This migration scheme has three phases named Pre-copy, Freeze-and-Copy and Post-Copy, which is similar to Xen’s hypervisors memory migration process. In the first stage disk data is copied to destination in n iterations. Block-map is used to track the dirtied disk during this stage. The dirtied disk data is copied during the next iteration.
This stage is limited to a particular number or iteration and is proactively stopped
when the dirtying rate is faster than the transfer rate. In freeze-and-copy phase
the virtual machine running on the source machine is suspended and dirtied
memory pages are migrated to destination along with CPU state. The third stage is the combination of push and pull phase. Here the virtual machine is resumed and according to the details in dirty bitmap, the source push the dirty blocks and the destination machine pulls them. This process reduce the total migration time of the virtual machine.
Figure 2.1: Three-Phase whole system live migration
In [12] authors developed a mechanism called Incremental Migration (IM) algorithm which reduce the total migration time when migrating the virtual ma- chine back to the source machine. When the virtual machine is sent to the physical machine from which it is migrated earlier, this algorithm checks the block-bitmap to find out the dirty blocks modified in the virtual disk after earlier migration.
Only this dirtied blocks are migrated to the migrating physical location. This algorithm is shown in the figure 2.2.
Kleij et al. [2] conducted an experiment on laboratory test-bed to evaluate
the possibility of using Distributed Replicated Block Device (DRBD) [6] for mi-
Figure 2.2: Incremental Migration Algorithm
grating disk state over WAN, which created two copies of data disks by mirroring.
DRBD supports asynchronous, semi synchronous and full synchronous replication
modes. All the three replication modes are discussed in chapter 3.2.2. They ex-
pected that using DRBD in asynchronous mode gives better performance. They
constructed a laboratory test bed with 50ms RTT delay to emulate WAN. They
compared the performance of DRBD with high latency Internet Small Computer
System Interface (iSCSI) and concluded DRBD test is 2.5 times faster than re-
mote iSCSI test. The downtime observed during the virtual machine’s migration
is only 1.9 seconds. While coming to read statistics, DRBD’s performance is
better when compared to remote iSCSI. But there is no significant difference be-
tween the performance of DRBD and local iSCSI. However, the authors admit
that there are some inconsistencies in their results, which they cannot account
for. Our study may look similar to this work. In our thesis we include NFS
as a shared storage solution and we also measure the performance for different
latencies. In this authors analyzed the virtual machines disk performance based
on migration time, http mean request time and I/O read statistics. They used DRBD asynchronous replication algorithm to replicate data between nodes. We analyzed the disk I/O performance using different performance metrics discussed in section 4.6 by conducting both read and write I/O tests. We used DRBD synchronous replication model which is more secure than asynchronous protocol used in [2]. Our experiment results confirm that DRBD outperformed iSCSI and NFS while the virtual machine is migrated over WAN.
Wood et al. [11] discussed solutions for the limitations faced by virtual machines migrating over WAN. They experimented with an architecture called CloudNet that is distributed across datacenters in the United States separated by 1200km. Their architecture reduced network bandwidth utilization by 50%, and memory migration and pause time by 30 to 70%. The authors used the DRBD disk replication system to replicate disk in both source and destination.
Firstly DRBD brings the remote disk to consistent stage using synchronization demon. When both the disks are synchronized and stable, DRBD’s synchronous replication protocol is switched and the modified data is placed in TCP buffer to transmit towards destination disk. When the write state is confirmed on des- tination the synchronous mode is completed. The memory and CPU modes are migrated after disk migration stage. Content Based Redundancy (CBR) [11]
block technique is used to save bandwidth. CBR splits disk blocks and memory pages into fixed size finite blocks and uses Super-Fast Hash Algorithm (SFHA) to generate hash bases on their content. This hash is used to compare previously sent blocks. This solution saved 20 GB bandwidth when compared to the pro- cess of migrating full disk. It also reduced memory transfer time by 65%. They designed a scheme to retain network state using layer-2 VPN solution. In this experiment authors analyzed the network bandwidth utilized for virtual machines migration along with memory migration and pause times. On the other hand, in our experiment, we focused on the disk I/O performance of DRBD, iSCSI and NFS when the VM is migrating over WAN along with network utilization. From our experiments we can say that iSCSI and NFS consumed more than 75% of network bandwidth than DRBD.
Akoush et al. discussed the parameters [19] which effect the live migration
of virtual machines. Total migration time and downtime are the two important
performance parameters used to analyze virtual machines performance during
migration. Total migration time is the time required to move virtual machines
from one physical host to another physical host. Downtime is a period of time
where virtual machine doesn’t run. Page dirtying rate and network bandwidth
are the factors which affect the total migration time and downtime. The authors
implemented two simulation models, namely average page dirty rate (AVG) and
history based page dirty rate (HIST) which are accurate up to 90 percent of actual
results. They proposed a model called Activity Based Sector Synchronization
(ABSS) [22] which migrates virtual machine efficiently in a timely manner. The
ABSS algorithm predicts the sectors that are likely to be altered. This algorithm helps in minimizing the total migration time and the network bandwidth.
Robert et al. [23] worked on both disk related and network related issues.
They worked on migrating local persistent state by combining block level solution with pre copy state. When the virtual machine need to migrate, disk state is first pre-copied to the destination host. The virtual machine will still run during this stage. After some time this mechanism starts XEN to migrate memory and CPU states of the virtual machine. All the changes made to the disk on source side during the migration process are recorded in the form of deltas (Unit which contain written data, its location and size). These deltas are sent to destination and applied to the image. This mechanism has downtime of 3 seconds in LAN and 68 seconds in WAN. They combined Dynamic DNS (DDNS) with IP tunneling to retain old network connections after live migration. When the virtual machine in source host goes to pause state, a tunnel is created using Linux iproute2. This tunnel is between the virtual machine’s old IP address and the new IP address at destination. This mechanism discards / drops the packets on source host’s side during the final stage of migration. The packets related to the old connection are forwarded from source host to IP address of the virtual machine at destination via tunnel. The virtual machine is configured with a new dynamic IP address to prevent it from using the old tunnel for new connections. Thus, the virtual machine will now have two IP addresses. The disadvantage of this mechanism is that the tunnel between source and destination will not be closed until the old connections end. This requires the source host to run until old connections are closed.
Kazushi et al. [13], used data duplication technique to propose a fast virtual machine storage migration. This technique will reduce the volume and time of data transfer. Suppose there is a situation where a virtual machine should migrate frequently between site A and site B. Migrating a large disk between sites A and B will waste a lot of network resources and will take a long time. These frequent disk migrations can be eliminated by using duplication. When the VM is migrated for the first time from site A to site B, full disk is copied to site B. When the VM migrates back to site A, only changes made on disk in site B are replicated to disk in site A. A new diff image structure, which is a new virtual machine image format and Dirty Block Tracking (DBT) are developed to track the changes. This technique successfully reduced the migration time from 10 minutes to 10 seconds.
Travostino F et al. discussed about the advantages and requirements of long-
haul live migration [5]. Their experiment proved that virtual machines can mi-
grate over WAN across long distance geographical locations instead of being lim-
ited to small local datacenters. Their scheme has a dedicated agent called as
Virtual Machine Traffic Controller (VMTC) that created dynamic tunnels be-
tween clients and the virtual hosts. The VMTC is responsible for migration of
the virtual machine and it maintains connectivity with the destination host where
the virtual machine should be migrated. The VMTC communicates with Authen- tication, Authorization and Accounting (AAA) module to get authentication in the form of a token to setup an on-demand, end-to-end path. It is the responsibil- ity of the VMTC to migrate the virtual machine and reconfigure the IP tunnels so that it can seamlessly communicate with its external clients. When a virtual host migrated from one host to other host the tunnel is broken and a new tunnel is created between the client and virtual host. They migrated their virtual ma- chine between Amsterdam and San Diego using a 1Gbps non-switched link with Round Trip Time (RTT) of 198ms. The application downtime observed during the migration is in the range of 0.8 to 1.6 seconds. Their experiment results show that, when compared to LAN migration, the downtime of migration over WAN is just 5-10 times higher, but the round trip time is more than 1000 times.This may be due to the dynamically established link (light path) of 1Gbps using layer-1 and layer-2 circuits without routing protocols. Anyhow, layer-3 protocols are used for communication between VM and clients.
Eric et al. [14] used Mobile IPv6 (MIPv6), a protocol for managing mobility of mobile nodes across wireless networks, to support live migration of virtual machine over WAN. The virtual machines behave as mobile nodes by enabling MIPv6 [24] in the TCP/IP protocol stack. This scheme enables a virtual machine to retain its IP address after migrating from the source host. It eliminates the need of DNS updates [23] on the virtual machine. The disadvantage in this scheme is that the virtual machine’s kernel must be modified to support mobility (unless that was already preconfigured).
Solomon et al. presented a live migration approach using Proxy Mobile IPv6 (PMIPv6) protocol [15]. This approach is similar to MIPv6 [14] but it does not require installation of mobile software on a virtual machine [25]. However, it requires specific infrastructure deployed in the network. In this experiment the source host and destination host act as Mobile Access Gateways (MAGs) and are connected to the Local Mobility Anchor (LMA). This choice in their experiment was just for convenience, to keep the experiment testbed simple. The LMA and MAG are entities independent infrastructure elements. The LMA and MAGs are equivalent of a home agent and foreign agent in Mobile IP. The LMA arranges that packets for the mobile node and forwarded them to the MAG where the node is currently located. The MAG is responsible for emulating the mobile node’s home network and for forwarding packets sent by the mobile node over the tunnel to LMA. LMA acts as a central management node tracking the virtual machines.
When a virtual machine is booted on source machine, suppose MAG1 it registers with LMA via Proxy Binding Unit (PBU) and Proxy Binding Acknowledgement (PBA). Upon successful registration a tunnel is created between MAG1 and LMA.
Now the virtual machine can be connected to outer world via LMA. When the VM
is migrated to other location, for example toMAG2, the tunnel between MAG1
and LMA is broken and the VM is deregistered via PBU and PBA messages. Now
the tunnel is created between MAG2 and LMA using the same previous process that happened for MAG1. In this solution VM will retain its ip address and network connections after the migration. Their experiment results showed that, this approach migrated virtual machine with minimum downtime when compared to MIPv6 approach. In this experiment authors used iSCSI shared storage to store VM’s disk to avoid disk state migration.
Forsman et al. [16] worked on automatic seamless or live migration of virtual machines in a data center. They developed an algorithm that migrates virtual machine based on the CPU work load. They used two strategies called push and pull. When the physical host’s workload crosses higher threshold level, the hotspot is detected and the push phase initiates virtual machine migration to other underutilized physical host to balance the load. When a physical host is underutilized and is below the lower threshold level the pull phase is initiated to balance the system by requesting virtual machines from other physical nodes.
They also used a concept called variable threshold, which varies the threshold
level of the load periodically, which resulted in a more balanced system. They
successfully simulated their technique using OmNet++ simulation software.
XEN Hypervisor and Storage Solutions
In this chapter we described about the Xen hypervisor and various stor- age solutions we used to analyse the disk performance during the live migration of virtual machine. Xen, Kernel Virtual Machine (KVM), VMware ESXi and Hyper-V are some of the widely used type-1 hypervisors. Among these hyper- visors VMware ESXi and Hyper-V are expensive and proprietary software with restrictive licence scheme. So, we restricted ourselves to open source hypervisors like Xen and KVM. KVM requires the physical host to support hardware assisted virtualization (HVM) but unfortunately our testbed doesn’t have hardware sup- porting HVM. So, we used Xen hypervisor and created para-virtualized virtual machines. We briefly described Xen hypervisor along with its features in section 3.1.
We used internet Small Computer Interface (iSCSI), Network File Storage (NFS) and Distributer Replicated Block Device (DRBD) storage solutions to store the disk of the virtual machines. These are the three widely used storage solutions supported by various Hypervisors to store the disk state of virtual machines. Each storage solution has its own advantages and disadvantages. A short comparison of these three storage solutions is shown in table 3.1.
Among these three solutions DRBD and iSCSI are block storage solutions and NFS is a file storage solution. In this thesis we used Xen hypervisor to migrate a virtual machine while conducting disk I/O tests. Our virtual host has its disk at shared or replicated storage solutions. So, the Xen migrates only the memory and CPU states leaving the disk state. The performance of VM’s disk I/O operations depends on underlying storage solutions. So, we assume that our results are valid for other available hypervisors.
3.1 Xen Hypervisor
Xen, [8] is an open source hypervisor developed at the Cambridge university computer science lab which supports both hardware assisted virtualization and Paravirtualization. It also supports live migration of virtual machines. Xen is
17
one of the five
1type-1 or bare metal hypervisor that are available as open source [42]. Xen allows multiple operating systems to run in parallel to host operating system. Xen is used for different open source and commercial application such as desktop virtualization, server virtualization, Infrastructure as a service (IaaS), security, hardware and embedded appliances. Today Xen hypervisor is powering large clouds in production [20]. The Xen hypervisor is responsible for handling interrupts, scheduling CPU and managing memory for the virtual machines.
Figure 3.1: Xen Architecture
Dom0 or the Domain-0 is the domain in which Xen starts during the boot.
From the Xen architecture which is shown in figure 3.1 we can see that Dom0 is the privileged control domain which has direct access to the underlying hard- ware. The Dom0 has the toolstack which is a user management interface to the Xen hypervisor. Xen toolstack can create, manage and destroy virtual machines or domUs which are unprivileged domains [20]. Xen supports hardware virtual- ization and Paravirtualization. In hardware virtualization, unmodified operating systems can be used for the virtual machines whereas Paravirtualization requires modification to the operating system’s kernel running inside virtual machines.
Doing so will increase the performance of paravirtualized hosts. The host operat- ing system should be Xen Paravirtualization enabled to create virtual machines.
The Linux kernels before 2.6.37 version are not Paravirtualization enabled. Their kernels should be recompiled to enable Paravirtualization. All the Linux kernels released after 2.6.37 version are by default Xen Paravirtualization enabled.
1The other open source hypervisors are KVM, OpenVZ, VirtualBox and Lguest.
Xen allows virtual machines to migrate between hosts while the guest operat- ing system is running. This feature is called live migration. In Xen, the demons running in the Dom0 of source and destination hosts takes the responsibility of migration. The memory and CPU states of the virtual machine are migrated from source machine to destination machine by the control domain. The Xen Hypervisor copies memory pages in series of rounds using Dynamic Rate-limiting and Rapid Page Dirtying [8] techniques to reduce the service downtime. Dy- namic Rate-Limiting algorithm adapts bandwidth limit for each pre-copy round and is used to decide when the pre-copy stage should end and stop-and-copy phase should start. Rapid Page Dirtying algorithm is used to detect the rapidly dirtied
2pages and skip them from copying them in pre-copy stage. Xen uses a microkernel design which consists of small footprint and interface that is around 1MB of size making it more secure and robust than the other available hypervi- sors. Xen hypervisor is capable to run main device driver for a system inside a virtual machine. The virtual machine that contains main device driver and device driver can be rebooted leaving the rest of the system unaffected [20]. This feature of Xen is called driver isolation. Driver isolation is a safe execution environment which protects the virtual machines from any buggy drivers [55]. Xen is operating system agnostic which means different operating systems like NetBSD and Open Solaris can be hosted.
Basically Xen supports two types of virtual Block Devices named ‘Phy’ and
‘file’. Phy is the physical block device which is available in the host environment where as file is the disk image which is available in the form of a file in the host computer. The loop block device is create from the available image file and the block device is handled to the domU. Shared storage solutions like iSCSI use Phy and NFS use file.
3.2 Distributed Replicated Block Device (DRBD)
DRBD [43] stands for Distributed Replicated Block Device. It is a software- based tool which replicates data from block devices between servers and provides virtual/cluster storage solution. The block devices may be hard disks, logical volumes or disk partitions. In DRBD the data is replicated as soon as the ap- plication writes to or modifies the data on the disk. The top level applications are not aware of replication. DRBD uses synchronous and asynchronous tech- niques to mirror the block device. DRBD is controlled by a cluster software called “heartbeat”. If an active node crashes heartbeat initiates the failover pro- cesses. Each DRBD peer acts either in primary or secondary failover role. The user space applications in the secondary node will not have write access to the DRBD disk resource but they can read data. Write access to data is granted only
2Rapidly modified memory pages which are generally copied in stop-and-copy phase.