Service Migration in Virtualized Data Centers

(1)

Service Migration in

Virtualized Data Centers

Ser

vic

e M

ig

ra

tio

n i

n V

irt

ua

liz

ed

Da

ta

C

ent

er

s

Kyoomars Alizadeh Noghani

Service Migration in

Virtualized Data Centers

Kyoomars Alizadeh Noghani

Modern virtualized Data Centers (DCs) require efficient management techniques to guarantee high quality services while reducing their economical cost. The ability to live migrate virtual instances, e.g., Virtual Machines (VMs), both inside and among DCs, is a key operation for the majority of DC management tasks that brings significant flexibility into the DC infrastructure. However, live migration introduces new challenges as it ought to be fast and seamless while at the same time imposing a minimum overhead on the network.

This thesis investigates the networking challenges of short and long-haul live VM migration in Software Defined Networking (SDN) enabled DCs. We propose solutions to make the intra- and inter-DC live VM migration more seamless. Our proposed SDN-based framework for inter-DC migration improves the management, enhances the performance, and increases the scalability of interconnections among DCs.

Moreover, by considering the overhead of VM migration over the network, servers, and quality of service the VM provides, we explore the trade-off between the costs required to change the placement of VMs and the optimality degree of the placement in the DC. Results show that the cost of improving the placement might hamper the gain that could be achieved.

DOCTORAL THESIS | Karlstad University Studies | 2020:1

ISSN 1403-8099 | ISBN 978-91-7867-083-3 (pdf) | ISBN 978-91-7867-073-4 (print) PRINT & LAYOUT:

UNIVERSITY PRINTING OFFICE, KARLSTAD, 2020

(2)

Service Migration in

Virtualized Data Centers

Kyoomars Alizadeh Noghani

2020:1

Service Migration in Virtualized Data

Centers

Modern virtualized Data Centers (DCs) require efficient management techniques to guarantee high quality services while reducing their economical cost. The ability to live migrate virtual instances, e.g., Virtual Machines (VMs), both inside and among DCs, is a key operation for the majority of DC management tasks that brings significant flexibility into the DC infrastructure. However, live migration introduces new challenges as it ought to be fast and seamless while at the same time imposing a minimum overhead on the network.

This thesis investigates the networking challenges of short and long-haul live VM migration in Software Defined Networking (SDN) enabled DCs. We propose solutions to make the intra- and inter-DC live VM migration more seamless. Our proposed SDN-based framework for inter-DC migration improves the management, enhances the performance, and increases the scalability of interconnections among DCs.

Moreover, by considering the overhead of VM migration over the network, servers, and quality of service the VM provides, we explore the trade-off between the costs required to change the placement of VMs and the optimality degree of the placement in the DC. Results show that the cost of improving the placement might hamper the gain that could be achieved.

DOCTORAL THESIS | Karlstad University Studies | 2020:1 Faculty of Health, Science and Technology

Computer Science DOCTORAL THESIS | Karlstad University Studies | 2020:1

ISSN 1403-8099

ISBN 978-91-7867-083-3 (pdf) ISBN 978-91-7867-073-4 (print)

(3)

DOCTORAL THESIS | Karlstad University Studies | 2020:1

Service Migration in

Virtualized Data Centers

(4)

Print: Universitetstryckeriet, Karlstad 2020 Distribution:

Karlstad University

Faculty of Health, Science and Technology

Department of Mathematics and Computer Science SE-651 88 Karlstad, Sweden

+46 54 700 10 00

ISSN 1403-8099

urn:nbn:se:kau:diva-75921

Karlstad University Studies | 2020:1 DOCTORAL THESIS

Kyoomars Alizadeh Noghani

Service Migration in Virtualized Data Centers

WWW.KAU.SE

ISBN 978-91-7867-083-3 (pdf)

(5)

Service Migration in Virtualized Data Centers

Kyoomars Alizadeh Noghani

Department of Computer Science, Karlstad University, Sweden

Abstract

Modern virtualized Data Centers (DCs) require efficient management tech-niques to guarantee high quality services while reducing their economical cost. The ability to live migrate virtual instances, e.g., Virtual Machines (VMs), both inside and among DCs is a key operation for the majority of DC management tasks that brings significant flexibility into the DC infrastructure. However, live migration introduces new challenges as it ought to be fast and seamless while at the same time imposing a minimum overhead on the network. In this thesis, we study the networking problems of live service migration in modern DCs when services are deployed in virtualized environments, e.g., VMs and containers. In particular, this thesis has the following main objectives: (1) improving the live VM migration in Software-Defined Network (SDN) enabled DCs by addressing networking challenges of live VM migration, and (2) investigating the trade-off between the reconfiguration cost and optimality of the Service Function Chains (SFCs) placement after the reconfiguration has been applied when SFCs are composed of stateful Virtual Network Functions (VNFs).

To achieve the ﬁrst objective, in this thesis, we use distinctive characteristics of SDN architectures such as their centralized control over the network to accel-erate the network convergence time and address suboptimal routing problem. Consequently, we enhance the quality of intra- and inter-DC live migrations. Furthermore, we develop an SDN-based framework to improve the inter-DC live VM migration by automating the deployment, improving the management, enhancing the performance, and increasing the scalability of interconnections among DCs.

To accomplish the second objective, we investigate the overhead of dynamic reconfiguration of stateful VNFs. Dynamic reconfiguration of VNFs is fre-quently required in various circumstances, and live migration of VNFs is an integral part of this operation. By mathematically formulating the reconfigura-tion costs of stateful VNFs and developing a multi-objective heuristic solureconfigura-tion, we explore the trade-off between the reconfiguration cost required to improve a given placement and the degree of optimality achieved after the reconfiguration is performed. Results show that the cost of performing the reconfiguration operations required to realize an optimal VNF placement might hamper the gain that could be achieved.

Keywords: Data Center, Ethernet VPN, EVPN, Live Service Migration, Reconﬁguration, SDN, Virtual Network Function, VNF.

(6)

(7)

v

Acknowledgments

Ph.D. students, to be successful, should eat the ‘Ph.D. elephant’ one bite at a time. Eating the last bite is the most bittersweet part of it. It reminds me of all the challenges I encountered, while the good feeling of how much I learned, tried, and accomplished runs through my mind. Now, it is time to acknowledge the people who helped me through this challenging route.

First and foremost, I would like to express my sincere gratitude to my main advisor, Professor Andreas Kassler, for his insightful advice, reliable guidance, and full support. His notable intelligence in reading my mind, reduced the complexity of describing the research challenges that I was dealing with. I want to extend my sincere thanks to my committed co-supervisor, Senior Lecturer Karl-Johan Grinnemo, for his continued attention and valuable guidance. A special thanks to Dr. Hakim Ghazzai who made three months of my Ph.D. unforgettable. I have been privileged to work with this great fellow. I would also like to thank all my co-authors, colleagues, and friends from the Depart-ment of Computer Science at Karlstad University.

I would like to thank my wonderful parents, brother, sister, and in-laws for their unconditional love and spiritual support. Without them, I would not have come this far.

This dissertation is dedicated to my beloved wife, Farzaneh, and our new

family member,Noyan. Farzaneh, you mean the world to me. I am utterly

blessed to have you in my life. Noyan, I have learned from my Ph.D. research

that moving is indispensable but is also challenging since it has to be fast and seamless. I truly hope that on the day you decide to move from our home and become independent I can manage my emotional challenges fast and seamlessly.

(8)

(9)

vii

List of Appended Papers

This thesis is based on the work reported in the following papers.

I. Cristian Hernandez Benet, Kyoomars Alizadeh Noghani, and Andreas Kassler. Minimizing Live VM Migration Downtime Using OpenFlow based Resiliency Mechanisms. In 5th IEEE Conference on Cloud

Network-ing (Cloudnet), Pisa, Italy, October 3–5, 2016.

II. Kyoomars Alizadeh Noghani, Cristian Hernandez Benet, Andreas Kassler, Antonio Marotta, Patrick Jestin, and Vivek Srivastava. Automating Eth-ernet VPN Deployment in SDN-based Data Centers. In 4th IEEE

Con-ference on Software Deﬁned Systems (SDS), Valencia, Spain, May 8–11,

2017.

III. Cristian Hernandez Benet, Kyoomars Alizadeh Noghani, Andreas Kassler, Ognjen Dobrijević, and Patrick Jestin. Policy-based Routing and Load Balancing for EVPN-based Data Center Interconnections. In IEEE

Con-ference on Network Function Virtualization and Software Deﬁned Networks (NFV-SDN), Berlin, Germany, November 6–8, 2017.

IV. Kyoomars Alizadeh Noghani and Andreas Kassler. SDN Enhanced Ethernet VPN for Data Center Interconnect. In 6th IEEE Conference on

Cloud Networking (Cloudnet), Prague, Czech Republic, September 25–27,

2017.

V. Kyoomars Alizadeh Noghani, Andreas Kassler, and Prem Sankar Gopan-nan. EVPN/SDN Assisted Live VM Migration between Geo-Distributed Data Centers. In 4th IEEE Conference on Network Softwarization (NetSoft), Montreal, Canada, June 25–29, 2018.

VI. Kyoomars Alizadeh Noghani, Andreas Kassler, and Javid Taheri. On the Cost-Optimality Trade-oﬀ for Service Function Chain Reconﬁgura-tion. In 8th IEEE Conference on Cloud Networking (Cloudnet), Coimbra, Portugal, November 4–6, 2019.

VII. Kyoomars Alizadeh Noghani, Andreas Kassler, Javid Taheri, Peter Öh-lén, and Calin Curescu. On the Cost-Optimality Trade-oﬀ for Fast Service Function Chain Reconﬁguration. Under Submission.

VIII. Cristian Hernandez Benet, Robayet Nasim, Kyoomars Alizadeh Noghani, and Andreas Kassler. OpenStackEmu - A Cloud Testbed Combining Network Emulation with OpenStack and SDN. In 14th IEEE Annual

Consumer Communications Networking Conference (CCNC), Las Vegas,

NV, USA, January 8–11, 2017.

Note: Some of the appended papers have been subjected to minor editorial

(10)

viii

Comments on my Participation

Paper I The initial idea of the paper originated from discussions with my

colleague, Cristian Hernandez Benet. Cristian and I equally participated in de-veloping all proposed resiliency solutions to address the challenges of intra-DC VM migration. Additionally, I was equally involved in writing the following sections of the paper: introduction, background, proposed solutions, and con-clusion. Conducting the experiments and writing the evaluation section are done by Cristian.

Paper II I designed, developed, and implemented the proposed framework

as well as conducted the experiments for the evaluations. Moreover, I am the principal author of all parts of the paper. My co-author, Cristian, assisted me in one of the experiments. Moreover, Cristian helped me in writing and reviewing the paper.

Paper III Christian Hernandez Benet, is the main author of this paper.

Cris-tian and I equally worked on developing the architecture and traﬃc engineering policies. The experiments are prepared and conducted by Cristian. I mainly authored the architecture section and helped Cristian with writing all sections of the paper except the evaluations.

Paper IV I am the main author of the paper. The idea of the paper came

from reading the IETF documents about the EVPN technology. I proposed an SDN-based solution and conducted experiments for evaluations.

Paper V The idea of the paper came from watching Cisco summits on data

center networks. I further investigated the problem, proposed an SDN-based solution, and carried out all the experimental evaluations presented in the paper. Furthermore, I authored all sections of the paper.

Paper VI The idea of the paper originated from a meeting with the Ericsson

research group in Kista, Sweden. I investigated the overhead of service function chain reconfiguration and implemented a mathematical model to investigate the trade-off between the SFC reconfiguration cost and the optimality degree of the virtual network function placement. I carried out all the evaluations and I authored all sections of the paper.

Paper VII This paper is a continuation of the work done in Paper VI. The

optimization problem proposed in Paper VI is computationally complex and does not scale. Hence, a heuristic solution was required to provide the means to study the same trade-oﬀ mentioned above for Paper VI, but for various scenarios and system parameters. I have developed a multi-objective heuristic solution and performed all the evaluations. Furthermore, I was responsible for writing and structuring the paper.

(11)

ix

Paper VIII In addition to fruitful discussions on all aspects of this work, all

co-authors equally collaborated with deﬁning use case scenarios, developing OpenStackEmu (a cloud experimentation testbed), and writing the paper. For the demonstration part, I was the main responsible for setting up the SDN controller for routing the traﬃc inside the private cloud infrastructure and showing the dynamic load of the network links graphically.

Other Publications

• Kyoomars Alizadeh Noghani, Hakim Ghazzai, and Andreas Kassler. A Generic Framework for Task Oﬄoading in mmWave MEC Backhaul Networks. In IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, December 9–14, 2018.

• Kyoomars Alizadeh Noghani. Towards Seamless Live Migration in SDN-based Data Centers. Licentiate Thesis, Karlstad Univeristy, De-cember 2018.

• Cristian Hernandez Benet, Kyoomars Alizadeh Noghani, and Javid Taheri. SDN implementations and protocols. In Big Data and Software

Deﬁned Networks, Chapter 2, Pages 27-48, The Institution of Engineering

& Technology, March 2018.

• Kyoomars Alizadeh Noghani, Cristian Hernandez Benet, and Javid Taheri. SDN helps volume in Big Data. In Big Data and Software Deﬁned

Networks, Chapter 9, Pages 185-205, The Institution of Engineering &

Technology, March 2018.

• Abdelmounaam Rezgui, Kyoomars Alizadeh Noghani, Javid Taheri, Amir Mirzaeinia, Hamdy Soliman, and Nikolas Davis. SDN helps Big Data to become fault tolerant. In Big Data and Software Deﬁned Networks, Chapter 15, Pages 319-336, The Institution of Engineering & Technology, March 2018.

(12)

(13)

xi

Introductory Summary

1

1 Introduction 3

1.1 Migration as Part of DC Management . . . 4

1.2 Cost of Service Function Chain Reconﬁguration . . . 5

1.3 Thesis Structure . . . 6

2 Background 7 2.1 Concepts . . . 7

2.1.1 Live Service Migration . . . 7

2.1.2 Data Center, Cloud, and Edge . . . 9

2.1.3 Software-Deﬁned Networking . . . 10

2.1.4 Network Virtualization . . . 11

2.1.5 Network Function Virtualization . . . 11

2.2 Technologies . . . 13

2.2.1 SDN-based Resiliency Mechanisms . . . 13

2.2.2 VXLAN . . . 13

2.2.3 EVPN . . . 15

2.3 Model-Driven Network Management . . . 17

3 Related Work 18 3.1 Challenges of Seamless Live VM Migration . . . 18

3.1.1 Retain Network Connectivity . . . 18

3.1.2 Reduce Network Convergence Time . . . 20

3.1.3 Resolve Suboptimal Routing Problem . . . 20

3.1.4 Performance Improvement of Migration . . . 21

3.1.5 Performance Improvement of Required Technologies 21 3.2 Reconﬁguration of SFCs . . . 23

3.2.1 Flow Migration . . . 23

3.2.2 Creation, Termination, and Relocation . . . 25

3.2.3 Migration of VNFs with Special Requirements . . . . 26

4 Research Questions 27

5 Contributions 28

6 Research Methodology 32

7 Summary of Appended Papers 36

(14)

xii

Paper I:

Minimizing Live VM Migration Downtime Using

Open-Flow based Resiliency Mechanisms

51

1 Introduction 53

2 Background 55

2.1 SDN-based Resiliency Mechanisms . . . 55

2.2 Live VM Migration . . . 56

3 Flow Restoration for VM Migration 57 3.1 Legacy Network based Live VM Migration . . . 57

3.2 SDN-based Live VM Migration . . . 58

3.3 SDN-based Live VM Migration with FastFailover . . . 58

3.4 SDN-based Live VM Migration with Packet Bicasting . . . . 59

3.5 SDN-based Live VM Migration using Stateful Forwarding . . 60

4 Experimental Evaluation 61 5 Conclusion 64

Paper II:

Automating Ethernet VPN Deployment in SDN-based

Da-ta Centers

67

1 Introduction 69 2 Background 71 3 Architecture and Implementation 72 3.1 High-Level Architecture . . . 72

3.2 Enhanced SDN Functionalities for EVPN . . . 73

3.3 SDN Controller Modules . . . 74 3.3.1 Neutron . . . 74 3.3.2 L2VPN Service . . . 74 3.3.3 BGP-EVPN . . . 75 3.3.4 PEConﬁgure . . . 76 3.3.5 Existing Modules . . . 76 4 Evaluation 77 4.1 Evaluation Methodology . . . 77

4.2 EVPN Deployment Performance . . . 77

4.3 Module Performance Test . . . 79

(15)

xiii

Paper III:

Policy-based Routing and Load Balancing for EVPN-based

Data Center Interconnections

83

1 Introduction 85

2 Background and Related Work 86

3 Use Cases 87

4 Proposed SDN-based Framework 88

4.1 SDN Controller Modules . . . 89

4.1.1 The Neutron Module . . . 89

4.1.2 The L2VPN Service Module . . . 89

4.1.3 The Policy Manager (PM) Module . . . 90

4.1.4 The Strategy Manager Module . . . 90

4.2 Routing Policy Attributes . . . 91

4.2.1 Multi-Homing . . . 91

4.2.2 Load Balancing . . . 92

4.2.3 Bandwidth Reservation . . . 92

4.3 Policy Enforcement . . . 92

4.4 Exemplary Work Flow . . . 93

5 Evaluation and Results 94 5.1 Evaluation Methodology . . . 94

5.2 Policies . . . 96

5.2.1 No Multi-Homing (NO_MH) . . . 96

5.2.2 Multi-Homing . . . 96

5.2.3 Multi-Homing and Load Balancing (MHLB) . . . 96

5.2.4 Load Balancing, but Not Multi-Homing (LB_NO_MH) 96 5.2.5 Bandwidth Guarantee QoS (QoS) . . . 97

5.3 Results . . . 97

6 Conclusions and Future Work 100

Paper IV:

SDN Enhanced Ethernet VPN for Data Center

Intercon-nect

103

1 Introduction 105 2 Background 107 3 Proposed Architecture 108 3.1 BUM Traﬃc Routing . . . 109

3.2 Multicast Tree Inside a DC . . . 111

(16)

xiv

3.4 Using SDN Controller for DF Selection . . . 112

4 Evaluation 113 4.1 Experimental Methodology . . . 113

4.2 DF Switch-Over . . . 114

4.3 SDN Controller Triggered DF Change . . . 116

5 Conclusions 117

Paper V:

EVPN/SDN Assisted Live VM Migration between

Geo-Distributed Data Centers

119

1 Introduction 122 2 Design Challenges for Live VM Migration across the WAN 123 3 Background 125 3.1 VXLAN and EVPN . . . 125

3.2 VM Mobility in EVPN . . . 126

3.3 Distributed Gateway using EVPN . . . 127

4 Architecture and Implementation 128 4.1 Controller Modules . . . 128

4.1.1 L2VPN-Service . . . 129

4.1.2 BGP-EVPN . . . 129

4.1.3 VXLAN-Manager . . . 130

4.2 Improving Network Convergence Time across DCs . . . 130

4.3 Addressing the Hair-Pinning Problem . . . 132

5 Evaluation 133 5.1 Intra Subnet . . . 135

5.2 Inter Subnet . . . 138

6 Conclusions 140

Paper VI:

On the Cost-Optimality Trade-oﬀ for Service Function

Chain Reconﬁguration

143

1 Introduction 145 2 VNF Reconﬁguration 147 2.1 Objective Function . . . 147

2.2 Input Parameters and Decision Variables . . . 148

(17)

xv

3 Numerical Results 153

4 Conclusion and Future Work 156

Paper VII:

On the Cost-Optimality Trade-oﬀ for Fast Service

Func-tion Chain ReconﬁguraFunc-tion

159

1 Introduction 162

2 Motivation 164

2.1 Need for SFC Reconﬁguration . . . 164

2.2 Reconﬁguration Overheads . . . 164

2.3 VNF Replication or VNF Migration? . . . 167

3 VNF Reconﬁguration 168 3.1 Objective Function . . . 168

3.2 Input Parameters and Decision Variables . . . 169

3.3 Reconﬁguration Costs . . . 170

3.4 Other Reconﬁguration Costs . . . 175

4 Heuristic 176 5 Numerical Results 180 5.1 Simulations Setup . . . 181

5.2 MOGA Parameter Tuning . . . 181

5.3 Evaluating the Trade-oﬀ between Optimality and Reconﬁgura-tion Cost using MOGA . . . 184

5.3.1 Trade-oﬀ with Heuristic . . . 186

5.3.2 Impact of Initial Placement and Deployment Technology 189 6 Related Work 192 6.1 SFC Reconﬁguration Operations: . . . 192

6.2 Multi-Objective VNF Placement & Heuristic Solutions . . . 193

6.3 Seeded Multi-Objective Heuristics . . . 193

7 Conclusion and Future Work 194 7.1 Linearization . . . 195

7.2 Normalization . . . 195

Paper VIII:

OpenStackEmu - A Cloud Testbed Combining Network

Emulation with OpenStack and SDN

201

(18)

xvi

2 OpenStackEmu Architecture and Implementation 205

(19)

Introductory Summary

“We are stuck with technology when what we

really want is just stuﬀ that works.”

(20)

(21)

Service Migration in Virtualized Data Centers 3

1 Introduction

Server virtualization has fundamentally changed the ways Data Centers (DCs) are built and operated. In server virtualization, resources of the physical machine (server) are divided into multiple isolated and independent “virtual instances” like Virtual Machines (VMs) and containers. Therefore, a server in a DC may host multiple VMs or containers, providing different services and belonging to different customers. Inspired by the success of virtualization in the area of computing, virtualization has become the foundation for emerging new concepts such as Network Function Virtualization (NFV) [41]. In NFV, services in the network, also known as Network Functions (NF), e.g., firewalls, that are traditionally deployed on customized hardware are virtualized and shipped inside VMs and containers. Such Virtual Network Functions (VNFs) can be placed on commodity servers utilizing the flexibility and scalability aspects of modern virtualized DCs.

Modern virtualized DCs are an integral part of the cloud computing paradigm. In cloud DCs, tens of thousands of VMs are deployed over physical servers to provide services and enabling dozens of different types of applica-tions and workloads, e.g., computing [1] and storage [4], to run and scale with a quality-of-service (QoS) guarantee. The cloud is a dynamic environment. Tenants join and leave frequently. Consequently, new services are required to be provisioned and ongoing services ought to be terminated. Likewise, the demands for resources in DCs fluctuate over time, making resources over- or underutilized. To provide high-quality and cost-efficient cloud services, effec-tive management of DC infrastructures is crucial. Through DC management, providers ensure that the services provided by a VM or a set of VMs fulfill the Service-Level Agreement (SLA) negotiated with the end-user. Most manage-ment tasks require or lead to reconfiguration of virtualized DC infrastructure. Using reconfiguration, DC providers transform the current state of deployed services in the DC, e.g., the current placement of VMs that host services, to another state in a way that the new state actualizes providers’ objective(s).

An important objective of modern DCs is to ensure high availability [52]. Therefore, most reconfiguration operations have to be performed online with a minimum negative impact on the performance of deployed services in DCs. Online reconfiguration comprises several actions, including instantiation, ter-mination, and live migration of VMs that host services. In live migration, the entire or a part of VM moves from one host to another without any service disruption. Among all reconfiguration actions mentioned above, live VM mi-gration is the most challenging action. First and foremost, live VM mimi-gration has to be performed fast with minimum service disruption, or else the effec-tiveness of the reconfiguration procedure is open to question. Furthermore, live VM migration has to impose minimum overhead on the network; if not, it deteriorates the quality of other services the DC provides. Despite these challenges, most online service reconfigurations require the support of live migration. As a result, it is crucial to investigate and address the problems of live VM migration procedure.

(22)

4 Introductory Summary

In this thesis, we study the problem of online reconﬁguration when it entails relocation of VMs, and in so doing attempt to tackle the following questions: (1) How can we improve the live VM migration in Software-Deﬁned

Network (SDN) enabled DCs by addressing networking challenges of live VM migration?, and (2) What is the trade-off between the reconfiguration cost and optimality of the SFC placement after the reconfiguration has been applied?

1.1 Migration as Part of DC Management

The increased importance of cloud-based services in today’s society has drawn a lot of attention to the eﬃcient management of the cloud DC infrastructure. Server consolidation [16, 42, 45], load balancing [56], server maintenance [92], ‘follow the sun’ [125], and ‘follow the moon’ [120] are examples of intra-and inter-DC management operations that help DC providers to improve the quality of service and reduce their operational costs.

Most DC management tasks, including all those mentioned above, are possible to implement if live migration of VMs that host services is supported. Thus, supporting live VM migration is a ‘must have’ feature for the management of cloud DCs, specifically for new cloud architectures such as mobile edge cloud [15]. Using edge clouds, end-users take advantage of the benefits of cloud, e.g., high computation capacity, in their close proximity. As a result, mobile users may think of offloading tasks, e.g., computation-intensive tasks, from their mobile devices to the edge cloud so they can overcome resource limitations of mobile devices and can extend their mobiles’ battery life [86]. However, an end-user is subject to move. Thus, to maintain the benefits of a high-throughput, low-latency connection, the VM that hosts the service needs to be live migrated between servers in the same or different edge clouds [30, 80, 101].

Improving the quality of live migration enhances the eﬃciency of DC management. Ideally, live migration ought to be fast and seamless. This requires the migrating node to maintain its ongoing connections, experience a negligible (preferably zero) service diruption, and performance degradation. According to [91], the quality of a live migration procedure can be improved through either of the following ways [91]: (1) by improving the live migration scheme used by the hypervisors, and (2) by improving the performance of the migration at the network level [72]. Although research that focuses on the former solution is signiﬁcant (see [128]), the impact of the DC network in conducting a seamless live migration is less investigated.

In the ﬁrst part of this thesis (Papers I-V), we seek for solutions to improve the live VM migration by addressing pertinent networking challenges. In particular, two networking challenges have been targeted:

1) Network Convergence Time: The total time required for network devices to update their routing tables according to the latest changes in the network (e.g., VM migration) is known as network convergence time. In legacy networks, the procedure of network convergence is postponed to after the completion of live migration, which consequently adds to the service disruption time [129].

(23)

2) Suboptimal Routing Problem: When a VM migrates, the network needs to find a new routing to the VM. Using a suboptimal path for the ingress and egress traffic of a migrating node can significantly degrade the service performance and introduce problems such as congestion.

The inﬂexible and decentralized structure of legacy networks are the main obstacles that prevent legacy networks from addressing related networking challenges of live migration. Conversely, distinctive characteristics of the SDN paradigm such as its holistic view of the network and its tight integration with cloud management platforms, e.g., OpenStack [11] and Kubernetes [7], provide a unique opportunity to improve the live VM migration. This thesis proposes to improve the quality of intra- and inter-DC live VM migrations by accelerating the network convergence and addressing suboptimal routing problem by harnessing SDN.

In addition to networking challenges mentioned above, conducting a live migration in legacy DC networks is an intricate task as it requires notable efforts to configure the underlying network, for instance, to provide interconnection among DCs. Configuration of large-scale networks such as DCs is a labor-intensive, time-consuming, and error-prone task. Management of technologies and addressing their problems, e.g., scalability, are other challenges that need to be solved after the technology has been deployed. Limitations of network technologies may degrade the quality of live service migration. For instance, if the technology used to interconnect DCs falls short to manage broadcast traffic properly, it may not be able to transfer migration traffic efficiently.

Ethernet Virtual Private Network (EVPN) [37], which is a layer-2 intercon-nection solution, is one of the technologies that can play a key role in inter-DC migration scenarios. When remote sites are interconnected through EVPN, migrating instances can retain their ongoing connections, which is essential in the live migration procedure. Moreover, EVPN is designed with features to advertise migration of nodes in the network. Therefore, in this thesis, we attempt to improve the EVPN operations. On the basis of the SDN architec-ture, we develope a framework inside the OpenDaylight (ODL) [10] controller to automate the deployment and improve the management of EVPN-based DC-interconnections. We extend the framework mentioned above to improve the performance and scalability of such interconnections by deploying routing policies and handling the broadcast traﬃc more eﬃciently.

1.2 Cost of Service Function Chain Reconﬁguration

When two or more NFs are interconnected to provide a given end-to-end service, they form a chain, known as a Service Function Chain (SFC). In the context of NFV, SFC is referred to as a sequenced-chain of VNFs. SFC is an enabling technology for the flexible management of specific traffic, provides solutions for classifying flows, and enforces policies along routes according to the service requirements of the flow [22].

Reconﬁguration of SFCs is required in various circumstances, and live migration is an integral part of this operation. For instance, reconﬁguration of

(24)

SFCs may help to increase the resiliency of NFV infrastructures [49]. Moreover, optimizing a suboptimal placement of SFCs is one of the important use cases of SFC reconfiguration, where potentially some VNFs have to be migrated to realize the target placement. Since the optimal placement of the VNFs in an SFC has a significant impact on the SFC performance, comprehensive research has been carried out on the optimal placement of VNFs [70]. However, the overheads of transforming a suboptimal placement to the optimal one have mainly been overlooked. Typically, network reconfiguration is a challenging operation as it may lead to severe service disruption if it is not well executed. This problem is exacerbated for SFCs as a service disruption caused by the reconfiguration of a VNF may lead to severe performance degradation of the nodes that follow the node in question in the SFC, and consequently the chain as a whole. Moreover, reconfiguration of SFCs may impose a significant overhead on the underlying network and degrade the performance of deployed services in the network. For example, the reconfiguration of SFCs may entail migration of VNF states which consumes network resources and imposes additional CPU load to both source and destination servers. Therefore, it is crucial to study reconfiguration overhead as it may hamper the gain that can be achieved by transforming a non- or suboptimal SFC placement to an optimal one.

Different parameters can reflect the overhead of reconfiguration. The total time that a given SFC is unable to provide service due to migration of its VNF(s), revenue loss due to service disruption, and imposed overhead over the end-host servers caused by migration procedure are examples of parameters that can represent the overhead of reconfiguration. In the second part of this thesis (Papers VI-VII), we study the overhead of online SFC reconfiguration. We model different costs of reconfiguration and formulate an optimization problem. By solving the problem both mathematically and with a fast heuristic solution, for real-world topologies and various system parameters, we study the trade-off between the optimality of a new SFC placement after reconfiguration and the total reconfiguration cost. These studies provide insights into the importance of considering the overhead of reconfiguration before performing one.

1.3 Thesis Structure

The remainder of this thesis is organized as follows. Section 2 provides an overview of the concepts and technologies that are used through this thesis. Related works are discussed in Section 3. The research questions and contri-butions are outlined in Sections 4 and 5, respectively. The research methods employed by the appended papers are discussed in Section 6. Section 7 pro-vides a summary of all the appended papers. Finally, Section 8 concludes the introductory summary.

(25)

2 Background

This section provides the necessary background to understand the topics under discussion in this thesis.

2.1 Concepts

In the following, we provide an overview of live service migration which is followed by an explanation of its application on diﬀerent cloud architectures, SDN, network virtualization, and NFV.

2.1.1 Live Service Migration

The ability to migrate a virtual node while it is providing a service between physical servers is a prerequisite for a dynamic and eﬀective DC management. We refer to this operation as live service migration. The need for live service migration stems from two main facts: (1) mobility of end-users, and (2) inap-propriate placement of services in the network. Due to the importance of this topic, a large number of solutions have been proposed in the last three decades to provide service migration. These solutions can be categorized into two main groups [128]: (1) process migration, and (2) virtual instance migration. The idea of process migration has been proposed as a way to support seamless connectivity in vehicular environments in which a single process that belongs to a given user migrates from one end-host to another. In the latter group, a virtual instance that hosts a process of a given user migrates between servers. Residential dependency is the problem that prevents process migration from being used in the management of today’s DC [128]. In contrast, migration of the entire virtual instance, e.g., the whole VM, ensures that the process can continue operation on any server across the DC network as the virtual instance itself provides all the requirements of the process. Moreover, live migration of virtual instances is more safe to deploy as they are clearly isolated from each other, making the consequences of live migration more predictable.

VMs and containers are virtualization techniques that are widely used in today’s DCs. In the following, we describe the main migration schemes for both VMs and containers and discuss the performance metrics used to assess the quality of live migration. Note that in this thesis, we assume services are deployed over lightweight VMs. Hence, live service migration happens in the form of live VM migration.

VM Migration: A VM migrates from one physical machine to another by transferring its states such as CPU, associated memory, and storage. There are three types of VM migration: hot, warm, and cold migration. Hot and warm migrations are also known as live migration and the cold migration is called non-live. In a hot migration [32], a VM transfers its states from one physical server to another while it is active and providing a service. In a warm migration [14], a new instance of the VM is created at the destination node

(26)

and synchronization is carried out between them unless the administrator calls the switch-over operation between instances. In a cold migration, operating systems and applications on a VM are suspended, the states of the VM are transferred to the destination host, and the VM continues its operation. Each type of VM migration has predeﬁned use cases. For instance, a cold migration is recommended for applications that can accept a downtime such as those that are rarely used. However, live migration is suitable for applications that are providing mission-critical tasks or those sensitive to data loss. Considering the requirements of DCs in providing high availability, live VM migration is the dominant type of migration.

For live migration of VM, three different schemes are proposed: pre-copy [32], post-pre-copy [54], and hybrid [107]. All these migration schemes comprise the following phases: (1) initialization, (2) reservation, (3) iteration, (4) stop-and-copy, (5) commitment, and (6) activation. Initialization and reser-vation are conducted before the VM states are transferred to the destination. In the initialization, the host is checked for compatibility of images, CPU architecture, etc. and during the reservation, resources on the destination host required for the new VM are reserved. In the iteration phase, the sys-tem states of the VM are transferred from the source to the destination node while the VM is still providing service. In the stop-and-copy phase, the VM stops servicing clients at the source node and transfers its latest system states including the modified or remained memory pages to the destination node. The last two steps follow the stop-and-copy phase. In the commitment phase, the destination host acknowledges receiving the consistent copy of the VM and finally the VM starts its operation after the activation phase [18]. Once network devices (routers and switches) are informed about the new location of the VM, they update their routing/forwarding information and steer the traffic of the VM to its new location.

Although exploiting an optimal migration scheme to transfer the system states is crucial, a live seamless migration needs comprehensive support from the underlying network. Maintaining ongoing connections and avoiding per-formance degradation after the migration are among challenges that require network-level solutions. Additionally, the network can play a key role in reducing the total migration time and the downtime, for instance, by using an appropriate path for state migration. In Section 3, we consider networking challenges inherent in live migration and summarize several solutions presented to address these challenges.

Container Migration: Virtualization technology has improved significantly over the last decade. In addition to performance improvement, moving towards lightweight virtualization is a recent trend. Containers and light VMs (e.g., LightVMs [83]) are examples of recent efforts to replace large-sized VMs with lightweight counterparts that offer fast instantiation times and small per-instance memory footprints. In contrast to VMs that abstract the entire machine, containers are implemented by OS virtualization and share the underlying OS kernel as well as user-space libraries.

(27)

Live migration of containers that are running stateless services (e.g., REST-ful Web services) is similar to the termination of a container on the source server and the instantiation of a new container on the destination server. For containers that host stateful services, Checkpoint and Restart (CR) [88] is the main migration technology that is used. Using CR, the memory state of a process is saved into ﬁles and the process resumes at the destination host from the saved checkpoint. As mentioned before, containers are lighter than traditional VMs which make their migration less challenging in terms of the overhead that migration imposes over the network and servers. However, migration of containers is not always more straightforward as the quality of migration depends on various parameters such as the size of states and their update frequency. Moreover, in contrast to VM migration that can run on any server managed by the same virtual machine manager, the container can only migrate to servers that support all requirements of the given container including essential libraries [128].

Performance Metrics: Ideal live migration is fast and seamless, and imposes minimum overhead on the network. The quality of live migration can be evaluated by several performance metrics such as total migration time and downtime. The amount of time during which the node in migration stops executing both at the source and at the destination hosts is called downtime. Reducing downtime is one of the main objectives of a migration scheme. Service degradation, network bandwidth utilization, and total migration traﬃc are other performance metrics used to assess the quality of live migration [128]. 2.1.2 Data Center, Cloud, and Edge

By increasing the importance and size of data, a larger number of businesses are migrating from in-house data and system management to cloud computing. Cloud computing has become very popular among businesses as it improves the eﬃciency, scalability, ﬂexibility, and availability of the business while it reduces capital and operational costs by removing the cost of running own centralized computing networks and servers.

Nowadays, cloud computing oﬀers a wide range of services and applica-tions to its users including Infrastructure as a Service (IaaS), Platform as a Service(PaaS), and Software as a Service (SaaS). These applications and services are supported by service provider’s DCs. To improve the scalability, availabili-ty, and resiliency of a DC infrastructure, cloud providers establish multiple DCs, scattered in a wide geographical region. Although distributed DCs have increased the quality of cloud services, they fail to address the strict require-ments of new generation of cloud-based services such as latency-sensitive IoT applications. Therefore, a new cloud architecture, known as edge cloud, is proposed. In edge cloud, resources in a cloud are distributed in small clusters but larger numbers. In an edge cloud, unlike large DCs sitting at the center of the cloud, small clusters are placed in shorter distances from the end-users to provide cloud services with lower latency.

(28)

Regardless of the cloud architecture (either a centralized DC or an edge cloud), cloud has to be managed eﬀectively to deliver high quality services. Live service migration is the key technology required in the majority of cloud management operations. For instance, to prevent a waste of resources, cloud providers may need to consolidate servers. Server consolidation needs VM mi-gration to pack currently running workloads from all servers into a minimum number of physical machines. Considering the importance of high availability for modern DCs, most of the migrations have to happen live since cold mi-gration means revenue loss. Likewise, in edge cloud, supporting mobility is crucial. Mobility of end-users in conjunction with the limited coverage of edge servers may result in signiﬁcant performance degradation or even disruption of the service. To ensure service continuity, high throughput and low latency connections, it is essential to realize seamless service migration. For more information about recent research on service migration in edge cloud, we refer the reader to [123].

2.1.3 Software-Deﬁned Networking

In legacy networks, the control and data plane are tightly coupled into the same device. The control plane is responsible for control decisions, e.g., decisions about where to send traffic. The data plane, on the other hand, is responsible for processing the packets fast according to control plane decisions. Such a coupled design contributes to a less flexible network management and limits the scalability of the network [94]. SDN entails the decoupling of the control plane from the forwarding plane. In SDN, control plane functions are offload-ed to a centralizoffload-ed controller [87]. Hence, rather than letting each node in the network make its own forwarding decisions, a centralized software-based controller is responsible for instructing subordinate hardware nodes on how to forward traffic. Such a network is said to be defined by software running on the centralized control plane. The resulting benefits include support for multi-vendor environments, more granular network control, improved automa-tion and management, accelerated service deployments, and unprecedented flexibility at lower cost.

OpenDayLight Controller: There are currently a large number of existing SDN controllers provided by vendors and open-source communities. Con-trollers are diﬀerent in architecture, features, and supporting standards for external communication through the northbound and southbound interfaces. The SDN-based solutions proposed in this thesis to address the networking challenges of live VM migration are implemented over a well-known SDN controller, ODL. ODL is an open-source controller hosted by the Linux Foundation and supported by many SDN vendors, industry, and an SDN community with the commitment to collaborate and cooperate in building a uniquely SDN framework. The project is not only based on the OpenFlow standard but also on the extensive set of protocols aimed to encourage and give solutions towards the SDN and NFV technologies. ODL is built on a

(29)

collaborative development of modules across the framework to both extend existing standards and create new standards and novel solutions. Therefore, both industry and developers can beneﬁt from working together by creating new technologies and enhancing existing products by developing new standards or solutions to mitigate current problems such as high-energy consumption and low cross-section bandwidth. For more information about the ODL architecture, available modules, and features, we refer the reader to [115]. 2.1.4 Network Virtualization

Since server virtualization has proved itself as a promising technology, it has also been considered for communication networks. Network virtualization is a technology that aims at dividing a given physical network infrastructure into a set of logically isolated Virtual Networks (VN), also known as network slices [25]. The rationale behind slicing the network is to improve the utiliza-tion of the networking resources [20], and to allow network providers to oﬀer a variety of services over an existing physical network infrastructure [61].

Network slicing will play a key role in meeting the demands of the 5G use cases and its underlying cost requirements [94]. In 5G systems, providers will instantiate network slices and dedicate them to different types of services with different characteristics and requirements. For instance, a provider can create two slices and use each slice for different use cases: one slice for massive IoT devices and another slice for autonomous cars [93]. Each network slice will have its own resources, e.g., link bandwidths, its specific view of the network topology, and a set of network functions to meet the requirements of the service in question.

2.1.5 Network Function Virtualization

In addition to forwarding and routing functions, DC and enterprise networks deploy a variety of intermediary services, for instance, to distribute the load (e.g., load balancers) and enable remote connectivity (e.g., VPNs) [102].

Tra-ditionally, these services are hosted on dedicated physical appliances known as middleboxes. However, deploying network services on middleboxes brings several limitations. Due to their physical nature, middle-box deployments limit the flexibility of the network. Furthermore, network administrators need to provision enough middleboxes to cope with the predicted peak traffic, something which increases the capital expenditures. Workload in the networks, on the other hand, fluctuates radically during the time of a day. Consequently, deployed middleboxes become underutilized when the workload is not at the peak.

Advances in server virtualization technologies in conjunction with an increase in the processing power of commodity servers paved the way for the NFV concept. In NFV, network functions previously realized in costly hardware platforms are replaced by virtual counterparts, termed as VNF, that can be placed on any low-cost commodity hardware, on-demand, and at any

(30)

location of the network. Thus, NFV contributes to increase the flexibility and scalability of modern networks while it reduces capital and operational costs. VNF and SFC: VNFs can be categorized into two groups: (1) state-agnostic or stateless, and (2) stateful. The former type of VNFs (e.g., stateless load balancers) either do not store flow states or keep them on a remote data store infrastructure (e.g., using key-value stores) and retrieve them on-demand [59]. Stateful VNFs, on the other hand, maintain states locally due to performance or design concerns. Network address translators, WAN optimizers, and intrusion prevention systems are examples of stateful VNFs that all maintain dynamic states of flows, users, and network conditions [49, 110].

Typically, multiple VNFs cooperate by exchanging traffic in order to pro-vide a given end-to-end service forming a set of SFCs. Network propro-viders may deploy SFCs either manually or dynamically using a Network Service Header (NSH) [103] which is a new data plane protocol to define SFCs. In SFC, the order of service functions traversal is defined. When a traffic flow passes an SFC, a service function instance with higher priority applies a function to the packets of the given flow before forwarding the packets to the next service node.

SFC Reconfiguration: Network reconfiguration is a general concept which has numerous use cases in all types of networks. Depending on the type and requirements of the network, reconfiguration encompasses different actions. In the context of SFCs, reconfiguration includes the following sets of actions: (1) re-routing traffic flows either between the same or subsequent VNF instances of an SFC, (2) instantiation and termination of VNFs, and (3) relocation of VNFs. Re-routing traffic flows is required, for example, when contention between flows on network links has caused performance degradation of the service an SFC provides. In such a scenario, changing the flow routing may address or mitigate the problem. The procedure of traffic re-routing is also required in other SFC reconfiguration actions, including instantiation, termination, and relocation of VNFs. Instantiation of a new VNF and termination of the existing instance are actions required to cope with change in VNFs demands. Finally, the relocation of VNFs is needed when the current placement of the VNF does not satisfy the requirements. For stateless VNFs, relocation is equivalent to the termination of an existing VNF and the creation of a new, identical one. In contrast, when a VNF requires its states for future decisions and the states are locally attached to the running instance of the VNF, relocation is known as migration.

Dynamic reconﬁguration of SFCs is required in various scenarios, for in-stance, to conduct zero-downtime server maintenance [41], manage unexpected failures [102], and consolidate a suboptimal placement of VNFs. When recon-ﬁguration is online and SFCs are composed of stateful VNFs, live migration of VNFs is required.

(31)

2.2 Technologies

This section elaborates on the technologies used to improve the live VM mi-gration procedure including the SDN-based resiliency mechanism, Virtual Extensible LAN (VXLAN), EVPN, and model-driven network management technologies.

2.2.1 SDN-based Resiliency Mechanisms

SDN mechanisms that are designed to cope with network failures are classified into two general approaches: reactive and proactive [127]. The reactive scheme requires communication between a switch and its controller to provide backup paths dynamically. Once a link/node fails, the controller has to be notified which then reacts by finding an alternative path. Depending on the workload of the controller, this procedure may require a significant amount of time. In contrast, in the proactive schemes the network is designed in advance to cope with failures. In OpenFlow-enabled networks, proactive schemes are typically implemented using OpenFlow group tables. A flow rule in an OpenFlow group table can be defined based on several action buckets in which actions are defined based on status parameters. A predefined action is then executed locally without the involvement of the controller once parameters change.

Fast Failover group is one of the OpenFlow group tables that is designed to detect and overcome port failures. Initially, the SDN controller defines one primary bucket and one or several backup buckets. Only one bucket can be used at a time and it will not be changed unless the liveness of the currently used bucket’s watch port/group changes from ‘up’ to ‘down’. When such an event occurs, the Fast Failover group will quickly select the next bucket in the bucket list with a watch port/group that is ‘up’ [99]. For detecting the port/link state, Bidirectional Forwarding Detection (BFD) [23] is a commonly used technology. BFD determines the state of a port by establishing a connection using a three-way handshake, and subsequently sending periodic control messages over the link. If no response message is received within a specified interval, the link is considered ‘down’. Once BFD detects a link-down event, the link state is set to down, which automatically triggers the next predefined bucket in the Fast Failover Group table. To prepare the network for intra-DC VM migration, several OpenFlow-based proactive mechanisms have been exploited in Paper I. 2.2.2 VXLAN

The ability to maintain ongoing connections is a prerequisite for seamless VM migration. If a VM gets a new network configurations after migration, its ongoing connections ought to be reestablished which respectively violates the seamless feature of live VM migration. To conduct seamless migration, the network can either provide an opportunity for the migrating VM to maintain its configuration (e.g., VLAN) or convert the old configuration to the new one by manipulating the south and north traffic of the VM. However, many of the proposed solutions face major scalability problems. Hence, they are

(32)

(a) Before migration.

(b) After migration.

Figure 1: VM-to-VM communication in VXLAN-based network. not applicable for multi-tenant environments such as modern virtualized DCs. For instance, with a 12-bit VLAN ID, DC provider can create up to 4094 isolated networks using VLAN technology. However, due to the large number of tenants that a cloud provider might service, the VLAN limit is inadequate. VXLAN [82] is an overlay technology that provides an L2 extension over a shared L3 underlay infrastructure network by using MAC in IP/UDP tunneling encapsulation. In the VXLAN-based network, a VM can retain its network conﬁguration after migration while the requirements of multi-tenancy such as scalability and isolation are met.

(33)

Figure 1 demonstrates traffic routing in a DC network where VXLAN technology is deployed. We assume that VM-1 is communicating with VM-2 while it migrates from Host-1 to Host-3. Figure 1a depicts the communication procedure before the migration. When VM-1 sends an IP packet destined to VM-2, the packet is encapsulated into a normal Ethernet frame and is sent to its assigned VTEP (VXLAN Tunnel EndPoint) (see the inner header). VTEPs are responsible for encapsulating and decapsulating VXLAN traffic upon arrival or leaving a VXLAN tunnel. Once an Ethernet frame arrives to VTEP-1, it adds an 8-byte header and puts the whole frame as payload inside a new UDP datagram with the destination address of VTEP-2 (see the outer header). The IP-based backbone transfers the datagram from VTEP-1 to VTEP-2 according to the destination IP address of VTEP-2. Upon reception, VTEP-2 decapsulates the datagram and forwards its content to the final destination (VM-2).

Let us assume that the network provider migrates VM-1 from Host-1. After migration, VM-1 sends gratuitous ARP or RARP packets to notify all VTEPs about the migration event. Upon receiving the notification, VTEPs update their corresponding tables for future forwarding [82]. As can be seen in Figure 1b, VM-1 retains its network configuration after the migration (see the inner header in Figure 1b), however, the information in the outer header is different since the source VTEP is now changed.

In this thesis, we assume that the VXLAN overlay technology is deployed inside all DCs. As a result, a VM can retain its network conﬁguration while it migrates within a DC network.

2.2.3 EVPN

Although overlay technologies such as VXLAN are widely deployed in today’s DC networks, they are not designed to be a DC interconnect solution [76]. Extending the overlay network across DCs, expands the broadcast domain from one DC network to another which consequently introduces scalability, eﬃciency, and security problems. A practical DC interconnection technology should provide the basics of a solid and eﬃcient inter-DC connection such as control plane learning, multi-homing, and ARP caching [76].

EVPN encompasses the next-generation Ethernet L2VPN (Layer 2 Virtual Private Network) solutions and has been designed to provide per-ﬂow load bal-ancing, enhance ﬂexibility, improve the scalability, and decrease the operational complexity of existing L2VPN solutions. EVPN aligns the well-understood technical and operational principles of IP VPNs to Ethernet services by utiliz-ing MP-BGP in the control plane as the signalutiliz-ing method, which removes the

need for traditional ﬂood-and-learn1 _{in the data plane. EVPN in conjunction}

with the VXLAN overlay technology is an appropriate solution to span layer 2 domains between multiple DCs [105].

EVPN comprises four types of messages: Ethernet auto-discovery, MAC/IP advertisement, inclusive multicast, and Ethernet segment. In the following,

1_{In the context of L2VPNs, the ﬂood-and-learn is the procedure of disseminating mac-address}

(34)

we brieﬂy describe the MAC advertisement message and its corresponding extended community as they have been used in this thesis. For the description and use cases of other routing messages, we refer the reader to [37].

MAC Advertisement: The EVPN MAC/IP advertisement message is designed to advertise MAC/IP reachability information of VMs. When an EVPN capable node is informed about a new MAC address, it advertises the information to its peers through the MP-BGP protocol. All remote peers that belong to the same EVPN instance import this route and insert the announced

MAC address and its reachability information (e.g., Ethernet tag2) into their

MAC VRF (Virtual Routing and Forwarding) table. This process allows the remote nodes to know where to send traﬃc [37].

VM Mobility: By adding an extended community section to the MAC/IP advertisement message, EVPN-capable nodes can update each other on VM movements. Every MAC mobility event for a given MAC address contains a sequence number that increases with each MAC move. This is used by EVPN-capable nodes to ensure that MAC advertisements are processed correctly. An EVPN-capable node advertises a MAC address for the ﬁrst time with no

MAC mobility extended community attribute. When another EVPN capable

node detects a locally attached MAC address for which it had previously received a MAC/IP advertisement route, it advertises the MAC address in a MAC/IP advertisement route. The advertisement route is tagged with a MAC Mobility extended community attribute with a sequence number one greater than the last received sequence number [37]. Figure 2 illustrates an EVPN operational scenario. A PE (PE-1) advertises a newly learned MAC address provisioned on a customer network (DC-1) to its peers (PE-2 and PE-3) with no additional extended community attribute (Figure2a). Later, the VM migrates between remote sites. As it is shown in Figure 2b, the PE of the DC on the right side (PE-3) re-advertises the MAC address of the migrated VM with an updated sequence number in conjunction with some other updated parameters (e.g., route distinguisher (RD) and MPLS label) in the MAC advertisement message. Although advertising the migration using the EVPN approach is an improvement in comparison to its predecessor methods, it still postpones the network convergence to after the migration is conducted, which prolongs the service disruption period. In Paper V, we propose an SDN-based solution to further improve the migration advertisement and, consequently, the network convergence time.

Distributed Gateway: EVPN oﬀers a unique and scalable solution which allows gateways to be actively distributed across an arbitrary number of net-work elements. This is especially relevant in cloud environments where a tenant may exist or migrate anywhere in the network. Using the combination of MAC/IP advertisement message and a default gateway extended community, an EVPN capable node can distribute the gateway information to its peers. The remote peers treat the received MAC/IP address equivalent to their gateway interface for gateway processing. Thus, the gateway is distributed around all

(35)

(a) Initial advertisement.

(b) Advertisement after the ﬁrst migration

Figure 2: EVPN MAC mobility scenario. DC networks that are part of the same EVPN instance.

In Paper V, EVPN is the key technology that is used for inter-DC VM migration. First, it is used as the solution that interconnects remote sites. Second, its capability in advertising the migration is improved in a way to decrease the network convergence time. Finally, the EVPN capability to distribute the gateway information is used to resolve the suboptimal routing problem that usually emerges after long-haul migrations. In Paper V and by implementing the distributed gateway functionality in the SDN controller, we address a suboptimal routing problem that may emerge after long-haul migration.

2.3 Model-Driven Network Management

By increasing the size of networks and emerging DCs, configuration of the network devices has become more difficult for infrastructure providers. Large networks are usually multi-vendor where each network element is configured differently, e.g., using a different command-line interface. As a result, there is a clear need across the industry to simplify the configuration and management of both networks and devices. Model-driven network management automates and accelerates the procedure of creating services through the whole network. In model-driven network management, a data model is used for representing configurations together with standard protocols to transmit the modeled data. YANG [24] has positioned itself as the data model language for representing network device configurations, remote procedure calls, and notifications in a

(36)

standard way. Data deﬁned in YANG is transmitted to a network device using a protocol such as NETCONF [39].

Over the last couple of years, YANG and NETCONF have gained traction in the networking industry and there is a growing set of products from all vendors supporting YANG as the data model and NETCONF as the network management protocol. In Paper II, the SDN controller uses the model-driven network management to automate the conﬁguration of EVPN instances on edge routers of a DC.

3 Related Work

This section describes the research related to the work presented in this thesis. First, we give an overview of the research attempting to address the networking challenges of live migration. Then, we elaborate on the research pertaining to the cost of SFC reconﬁguration when it entails VNF migration.

3.1 Challenges of Seamless Live VM Migration

Supporting seamless live VM migration poses several important networking challenges that are discussed in the following sections.

3.1.1 Retain Network Connectivity

The ability to maintain ongoing connections is essential to have live VM migration. When the VM migrates between two physical servers inside a DC with an overlay technology covering the entire network, it is easier to maintain ongoing connections. However, migration of a VM between diﬀerent networks may require the IP address(es) of the VM to be changed. As a result, ongoing connections of the VM ought to be re-established which violates the seamless feature of live VM migration. Various solutions have been proposed to help the migrating node (e.g., VM) preserve its network connections. According to [53], the proposed solutions can be divided into two main categories based on the OSI network layer: (1) high-level, and (2) low-level schemes.

Solutions at the transport layer (e.g., TCP, UDP, SCTP) and above are collectively grouped as high-level schemes. The proposed solutions at this level are independent of the IP address. The authors of [128] thoroughly investigated the proposed solutions at the transport layer (L4). Migrate [112] is an example of a proposal that handles mobility at the session layer. Solutions based on HIP (Host Identity Protocol) [90] also fall into the high-level category. By detaching the naming function from the IP address, HIP supports mobility in the network. Instead of binding to IP addresses, applications are bound to HITs (Host Identity Tags) which are a 128-bit global identiﬁer generated by hashing public keys. Then, for an application to be globally reachable, a mechanism, e.g., DNS [66] or rendezvous servers [67], is used to map HITs to routable network-level addresses. A large amount of work has been done

(37)

to enhance support for mobility using HIP. For further information about HIP-based solutions, we refer the reader to [96].

Solutions at low-level category operate at the layers below the transport layer. The proposed solutions in this category attempt to keep the IP address static from the perspective of the transport layer. Zhang et al. [128] summarized the literature related to the lower-level schemes including the research on L2, e.g., [125], and L3, e.g., [111]. For further information, we refer the reader to Table-IV in [128].

In addition to the methods mentioned above, several new solutions have been proposed based on newly emerged protocols, technologies, and architec-tures. For instance, the authors in [30,69,95] used Multipath TCP (MPTCP) [43] to perform wide-area live VM migration. Several studies used overlay networks, e.g., LISP [104], to preserve the ongoing connection of migrating node. Finally, a number of solutions utilized the characteristics of SDN to tackle the same problem [26, 50, 85, 104, 125].

However, the applicability of the high- and low-level proposed solutions is debatable. High-level solutions require modifications to endpoint operating systems’ network stack. Furthermore, high-level schemes increase signaling and processing load on endpoint devices. Low-level schemes require infrastructural support, possibly specific hardware such as multiple network interfaces, or need all involved networks to support a specific protocol, e.g., Mobile IP [100]. Extending an overlay network, e.g., VXLAN, from one DC to another one is neither scalable nor efficient as it extends the broadcast domain. Moreover, some overlay technologies such as legacy VPN-based solutions are limited in terms of redundancy, scalability, flexibility, and forwarding policies. Finally, proposed SDN-based solutions map the old network addresses to new ones in order to maintain the ongoing connections, which is not a scalable solution. Additionally, most SDN-based solutions assume that the controller has a holistic view over all DC networks. Nonetheless, due to the constraint of security policies and scalability requirements, DCs usually have their own controller [73].

The combination of VXLAN and EVPN proposed in [109] is the solution that we used in this thesis to preserve network connectivity. We chose this solution due to advantages over its alternatives. First, both VXLAN and EVPN are intrinsically designed to address the requirements of modern DCs including live VM migration. Second, the integration of VXLAN and EVPN creates a scalable solution to preserve network connectivity in modern cloud infrastruc-tures. While VXLAN provides possibility for seamless VM migration inside a DC, EVPN helps inter-DC seamless VM migration by extending L2 connec-tivity between remote sites in a solid and scalable manner. Finally, deploying VXLAN and EVPN technologies do not require fundamental changes in the structure of today’s DCs as both are supported inside the new generation of DC network fabrics. For more information about the advantages of VXLAN and EVPN integration, we refer the reader to [13].

Service Migration in Virtualized Data Centers

Service Migration in

Virtualized Data Centers

Ser

vic

e M

ig

ra

tio

n i

n V

irt

ua

liz

ed

Da

ta

C

ent

er

s

Kyoomars Alizadeh Noghani

Service Migration in

Virtualized Data Centers

Kyoomars Alizadeh Noghani

Service Migration in

Virtualized Data Centers

Kyoomars Alizadeh Noghani

Service Migration in Virtualized Data

Centers

Service Migration in

Virtualized Data Centers

WWW.KAU.SE

Service Migration in Virtualized Data Centers

Abstract

Acknowledgments

List of Appended Papers

Comments on my Participation

Other Publications

Contents

Introductory Summary

1

Paper I:

Minimizing Live VM Migration Downtime Using

Open-Flow based Resiliency Mechanisms

51

Paper II:

Automating Ethernet VPN Deployment in SDN-based

Da-ta Centers

67

Paper III:

Policy-based Routing and Load Balancing for EVPN-based

Data Center Interconnections

83

Paper IV:

SDN Enhanced Ethernet VPN for Data Center

Intercon-nect

103

Paper V:

EVPN/SDN Assisted Live VM Migration between

Geo-Distributed Data Centers

119

Paper VI:

On the Cost-Optimality Trade-oﬀ for Service Function

Chain Reconﬁguration

143

Paper VII:

On the Cost-Optimality Trade-oﬀ for Fast Service

Func-tion Chain ReconﬁguraFunc-tion

159

Paper VIII:

OpenStackEmu - A Cloud Testbed Combining Network

Emulation with OpenStack and SDN

201

Introductory Summary

“We are stuck with technology when what we

really want is just stuﬀ that works.”

1 Introduction

1.1 Migration as Part of DC Management

1.2 Cost of Service Function Chain Reconﬁguration