Cloud Auto-Scaling Control Engine Based on Machine Learning

(1)

IN

DEGREE PROJECT

,

SECOND CYCLE, 30 CREDITS

,

STOCKHOLM SWEDEN 2018

Cloud Auto-Scaling Control

Engine Based on Machine

Learning

(2)

Cloud Auto-Scaling Control

Engine Based on Machine

Learning

Yantian You

2018-10-29

Master Thesis

Examiner

Gerald Q. Maguire Jr

Academic adviser

Anders V¨

astberg

Industrial adviser

Toni Satola

Roberto Muggianu

KTH Royal Institute of Technology

School of Electrical Engineering and Computer Science(EECS)

Department of Communication Systems

(3)

(4)

Abstract

With the development of modern data centers and networks, many service providers have moved most of their computing functions to the cloud. Consider-ing the limitation of network bandwidth and hardware or virtual resources, how to manage different virtual resources in a cloud environment so as to achieve better resource allocation is a big problem. Although some cloud infrastruc-tures provide simple default auto-scaling and orchestration mechanisms, such as OpenStack Heat service, they usually only depend on a single parameter, such as CPU utilization and cannot respond to the network changes in a timely manner.

This thesis investigates different auto-scaling mechanisms and designs an on-line control engine that cooperates with different OpenStack service APIs based on various network resource data. Two auto-scaling engines, Heat orchestra-tion based engine and machine learning based online control engine, have been developed and compared for different client requests patterns. Two machine learning methods, neural network, and linear regression have been considered to generate a control signal based on real-time network data. This thesis also shows the network’s non-linear behaviors for heavy traffic and proposes a scaling policy based on deep network analysis.

The results show that for offline training, the neural network and linear re-gression provide 81.5% and 84.8% accuracy respectively. However, for online testing with different client request patterns, the neural network results are dif-ferent than we expected, while linear regression provided us with much better results. The model comparison showed that these two auto-scaling mechanisms have similar behavior for a SMOOTH-load Pattern. However, for the SPIKEY-load Pattern, the linear regression based online control engine responded faster to network changes while heat orchestration service shows some delay. Com-pared with the proposed scaling policy with fewer web servers in use and ac-ceptable response latency, both of the two auto-scaling models waste network resources.

(5)

Sammanfattning

Med utvecklingen av moderna datacentraler och nätverk har m˚anga tj¨ anstelever-antörer flyttat de flesta av sina datafunktioner till molnet. Med tanke p˚a begränsningen av nätverksbandbredd och h˚ardvara eller virtuella resurser, är det ett stort problem att hantera olika virtuella resurser i en molnmiljö för att uppn˚a bättre resursallokering. Även om vissa molninfrastrukturer tillhandah˚aller enkla standardskalnings- och orkestrationsmekanismer, till exempel OpenStack Heat service, beror de vanligtvis bara p˚a en enda parameter, som CPU-utnyttjande och kan inte svara p˚a nätverksändringarna i tid.

Denna avhandling undersöker olika auto-skaleringsmekanismer och desig-nar en online-kontrollmotor som samarbetar med olika OpenStack-service API-skivor baserat p˚a olika nätverksresursdata. Tv˚a auto-skalermotorer, v¨ arme-orkestreringsbaserad motor- och maskininlärningsbaserad online-kontrollmotor, har utvecklats och jämförts för olika klientförfr˚ag-ningsmönster. Tv˚a mask-ininlärningsmetoder, neuralt nätverk och linjär regression har ansetts generera en styrsignal baserad p˚a realtids nätverksdata. Denna avhandling visar ocks˚a nätverkets olinjära beteenden för tung trafik och föresl˚ar en skaleringspolitik baserad p˚a djup nätverksanalys.

Resultaten visar att för nätutbildning, ger neuralt nätverk och linjär regres-sion 81,5% respektive 84,8% noggrannhet. För online-test med olika klientförfr˚ ag-ningsmönster är de neurala nätverksresultaten dock annorlunda än vad vi förv¨ ant-ade oss, medan linjär regression gav oss mycket bättre resultat. Modellen jämförelsen visade att dessa tv˚a auto-skala mekanismer har liknande beteende för ett SMOOTH-load mönster. För SPIKEY-load mönster svarade den linjära regressionsbaserade online-kontrollmotorn snabbare än nätverksförändringar me-dan värme-orkestrationstjänsten uppvisar viss fördröjning. Jämfört med den föreslagna skaleringspolitiken med färre webbservrar i bruk och acceptabel svarsf-¨

ordröjning, slöser b˚ada de tv˚a auto-skalande modellerna nätverksresurser.

(6)

Acknowledgment

I would like to thank my main academic supervisor prof. Gerald Q. Maguire from KTH for his guidance during this thesis project. He always provided thorough feedback when I found myself in trouble. I consider myself very lucky to have had the chance to work under his guidance.

I want to thank my supervisor Toni Satola and Roberto Muggianu, who provided me with a good working environment at Telia Company. They have provided me with a lot of great ideas and useful suggestions and made me feel welcome at Telia.

Finally, thanks to my parents for their endless support and encouragement. Stockholm, October 2018

(7)

List of Figures

2.1 Three Tier network architecture([1]) . . . 9

2.2 Cloud Computing Service Models . . . 10

2.3 ETSI MANO Framework([2]) . . . 12

2.4 AFI GANA Reference Model([3]) . . . 14

3.1 Relationship Between Openstack Service . . . 20

3.2 Structure of Testing Environment . . . 21

3.3 Network Topology of Load Balance Tenant . . . 22

3.4 Network Topology of autoManage Tenant . . . 22

3.5 Load Balancer Configuration . . . 25

3.6 The Data Flow Among Different Modules for Default Auto-scaling Mechanism . . . 28

3.7 The Data Flow Among Different Models for New Auto-scaling Mechanism . . . 28

4.1 The Pattern of Client Request Sending Rate (16 hours) . . . 31

4.2 Meters for Ceilometer . . . 32

4.3 Features in Training Data Set . . . 33

4.4 Data Set Before Pre-processing . . . 34

4.5 Data Set After Pre-processing . . . 34

4.6 Change Pattern for Cpu util . . . 36

4.7 Change Pattern for Network Incoming Rate . . . 36

4.8 Change Pattern for Network Outgoing Rate . . . 36

4.9 Change Pattern for Memory Usage . . . 36

4.10 Change pattern of Device Write Rate . . . 36

4.11 Change pattern of number of Web Servers . . . 36

4.12 Neural Network Structure . . . 38

5.1 Request Rate for a Single Webserver . . . 45

5.2 Response Time for a Single Webserver . . . 45

5.3 Accumulated Error Responses for a single WebServer . . . 45

5.4 CPU Utilization for a Single WebServer . . . 45

5.5 Request Rate for Unlimited Webservers . . . 47

5.6 Response Time for Unlimited Webservers . . . 47

5.7 Accumulated Error Responses for Unlimited Webservers . . . 47

5.8 Number of Sessions for Unlimited WebServers . . . 47

5.9 Average CPU Utilization vs Request Rate for Unlimited Web-Servers . . . 47

(10)

5.11 SMOOTH-load Pattern for testing . . . 49

5.12 SPIKEY-load Pattern for testing . . . 49

5.13 Neural Network Based Auto-scaling . . . 52

5.14 Reponse time for Neural Network Based Auto-scaling . . . 52

5.15 Hear Orchestration Based Auto-scaling with Heavy Traffic . . . . 53

5.16 Average CPU Utilization for Hear Orchestration Based Auto-scaling . . . 53

5.17 Total Response time for Hear Orchestration Based Auto-scaling . 53 5.18 Single Thread Response time for Hear Orchestration Based Auto-scaling . . . 53

5.19 Statistic Data of Response Time for Each Server . . . 54

5.20 Response Time Sensitive Auto-scaling with Heavy Traffic . . . . 55

5.21 Average CPU Utilization for Response Time Sensitive Auto-scaling 55 5.22 Total Response time for Response Time Sensitive Auto-scaling . 55 5.23 Single Thread Response time for Response Time Sensitive Auto-scaling . . . 55

5.24 Statistic Data of Response Time for Each Server . . . 56

5.25 Machine Learning Based Auto-scaling with Heavy Traffic . . . . 56

5.26 Response Time for Machine Learning Based Auto-scaling . . . . 56

5.27 The results for SMOOTH-load Pattern . . . 57

5.28 Response Lantency for Heat Orchestration Based Auto-scaling . 58 5.29 Response Time for Machine Learning Based Auto-scaling . . . . 58

5.30 Scaling Policy Based on Deep Network Analysis . . . 59

5.31 The results for SMOOTH-load Pattern . . . 59

5.32 The results for SPIKEY-load Pattern . . . 61

(11)

List of Tables

3.1 OpenStacke Lab Hardware configuration . . . 23

4.1 Parameter Tuning(3 neurons in the hidden layer) . . . 39

4.5 Data Set for Linear Regression Model . . . 42

(12)

List of acronyms and

abbreviations

API Application Programming Interface AWS Amazon Web Services

BSS Business System Support DCN Data Center Network DEs Decision Elements EM Element Management

ETSI European Telecommunication Standards Institute GANA Generic Autonomic Networking Architecture HTTP HyperText Transfer Protocol

IaaS Infrastructure as a Service LB Load Balancer

OSS Operation System Support

ONIX Overlay Network for Information eXchange PaaS Platform as a Service

PUE Power Usage Effectiveness SaaS Software as a Service SDN Software Defined Network SLA Service Level Agreement SLO Service Level Objectives

MBTS Model-Based Translation Service MANO Management and Orchestration NF Network Function

NFV Network Function Virtualization

NFVO Network Function Virtualization Orchestrator VM Virtual Machine

VNF Virtual Network Function VNFM VNF Manager

(13)

(14)

Chapter 1

Introduction

Today, people are communicating with each other through a huge network and we can easily get in touch with other people. Different people own different resources and information, hence resource transactions and sharing between people may bring large benefits and efficiency to today’s industry. To transfer this data, a network has been developed which interconnects the world. However, as more and more people are utilizing this network, the aggregate data flow has experienced an explosive growth. The development of the traditional network structure has fallen behind the increased network data flow. Modern networks that perform data flow control and network resource management have been developed.

This chapter addresses a specific problem in today’s’ cloud network of data flow control and resource management and gives a description of Network Function Virtualization (NFV) based Virtual Network Function (VNF) orches-tration. This chapter also describes the goals of this thesis project and outlines the structure of the thesis.

1.1 Motivation

(15)

Cloud orchestration is used to manage the interconnections and resource among servers in a cloud infrastructure environment, thus enabling a system to respond to changes in workload. This orchestration help to coordinate network resource management and network function control[4, 5]. By exploiting NFV, cloud orchestration can be implemented in a more convenient way and offers more powerful functionality. This convenience and functionality can be used in next-generation network infrastructures [5].

Auto-scaling is a very important orchestration service which automatically adjusts the number of servers according to real-time network traffic. This technique helps service providers switch off unnecessary servers and enables them to releases occupied network resources in a timely manner when the network traffic is light. It can also switch on servers when the network traffic increases. With auto-scaling, users not only save network resources but are also provided with a high quality of service.

1.2 Problem

Network infrastructure has become more complex and flexible, and resources are potentially shared among Virtualized Network Functions. Many cloud platforms such as OpenStack have been used in conjunction with NFV technology to deploy various cloud services. Network virtualization bring a lot of benefits. For example, developers can develop services with more flexible functions in a convenient way. However, NFV has introduced difficulty in managing this virtualized resource. That is to say, efficient auto management of network resources becomes more and more important.

On the other hand, machine learning has been widely used in data analysis, especially for pattern recognition and data prediction. However, it remains difficult to be applied in networking as network traffic usually varies with time and is difficult to predict.

Many cloud infrastructures provide default simple auto-scaling and orches-tration mechanisms to better realize data flow control and resource management. For example, OpenStack’s heat orchestration service can provide auto-scaling by setting a threshold for a single resource data sush as CPU utilization. As servers usually take some time to react to network traffic changes, a single resource (especially CPU utilization) based heat orchestration service may have some delay in responding to the changes. Since delay is always a serious problem and we want our system to have a quick response, especially when there is a sudden growth of network traffic, we need to consider the following questions:

• Can we consider various types of network data to realize better network auto-management and prediction?

(16)

1.3 Purpose

The purpose of this thesis is to develop an online control engine for (web) server auto-scaling in the OpenStack platform to realize better data flow management for NFV based cloud orchestration. This control engine will leverage statistical analysis or a machine learning technique. By investigating several n¨aıve cloud orchestration techniques (such as heat orchestration), we can gain a basic understanding of how current Network Function Virtualization Orchestration (NFVO) has developed with respect to its use in a cloud environment. This degree project intends to compare these orchestration techniques (n¨aıve and a new online control engine) to find the best solution concerning data flow control. By using raw data from a network, this project intends to achieve automatic network management. Hence functions such as resource allocation can be adjusted automatically based on real-time data. This auto adjustment should make network management more convenient and flexible since the system’s behavior is learned from observations rather than requiring a priori network knowledge.

1.4 Goals

The goal of this degree project is to develop an OpenStack based online orchestration engine to solve the auto-scaling problem and compare it with an existing cloud orchestration service.

Firstly, this thesis project investigated several NFVO techniques used in the market to better understand their data flow control mechanisms.

Secondly, based on the previous investigation and an OpenStack based cloud environment, a simple network structure with clients, load balancer, and several web servers (VMs) was built. A machine learning based control model was developed to cooperate with the Nova API, Ceilometer API, and Heat orchestration API to realize web server auto-scaling mechanism by leveraging network traffic data collected by Ceilometer.

Finally, a comparison is made between the new machine learning based online control engine and the n¨aıve Heat orchestration function in order to determine what method provides a better solution for the server auto-scaling problem.

1.5 Delimitations

(17)

The testing environment is an OpenStack based cloud system with only one load balancer and few clients and web servers. Therefore, the competition for resources only exists at the server side, hence we do not need to consider complex routing or forwarding protocols. The available VMs may become overloaded when the request rate grows which triggers the auto-scaling process.

For the auto-scaling mechanism, all the web servers are built by using the same image which means the memory size, CPU processing rate, and allocated disk space are the same for all the web servers. During the scaling down the process, we assume the choice of which web server is selected will not affect the results or performance of the system.

Additionally, security and HTTP request checking are not considered in this work in order to simplify the testing set up.

1.6 Methodology

Quantitative and experimental research methods are used in this project to develop the model and to propose a solution that achieves better results. The quantitative research method is useful when performing experiments or testing systems with a large database. A conclusion and all the derivation in this project will be drawn based on an analysis of the experiments and well-established theories.

The development of a new online control engine is based on existing orchestration services and well-developed machine learning models. A lot of investigation was done before starting to develop a service on the OpenStack Cloud Platform.

This master degree project follows the development process of a ML-based solution as described in [6]. Starting from testing environment set up, then to the data collection and processing, then to the model training, after that is the model testing step by using new data generated by the real system and finally the testing outcome analysis step which can be used as a feedback to improve the model.

As this project is to be carried out in the Telia Company, it is important that everything that is supposed to be confidential remains confidential.

1.7 Outline

This thesis studies basic network data, specifically incoming and outgoing network request rates and CPU utilization of VMs by leveraging the OpenStack based cloud platform. The study was performed in the following steps:

(18)

methods. Chapter 2 describes the analysis of these models.

2. The next step was to configure a lab environment based on the OpenStack cloud platform, including understanding how different OpenStack models work with each other and how to interact with the models’ APIs. Chapter 3 introduces the OpenStack lab.

3. The next step was to establish a basic offline training model by investigat-ing two different machine learninvestigat-ing models: Neural Networks and Linear Regression. Chapter 4 describes the process of offline training.

4. Based on the offline training, real-time network data is collected by ceilometer and used as feedback to create an online control engine. The testing results are analyzed and a comparison made between heat orchestration and the proposed new online control engine based on two clients’ request patterns. This is described in Chapter 5.

(19)

(20)

Chapter 2

Background

This chapter provides basic background information about cloud computing and NFVO based auto network management. Section 2.1 gives an introduction to today’s data center network. Section 2.2 introduces the Telia Company and Telia’s strategy of closed-loop control. Section 2.3 gives an overview of cloud computing. Then, Section 2.4 discusses virtualization and what advantages and disadvantages virtualization can bring to a software-based network. Next, Section 2.5 describes the basic NFV framework according to the ETSI standard, while Section 2.6 shows some existing auto network management techniques. Finally, Section 2.7 discusses the reason why we considered machine learning as a powerful tool for network management along with summarizing some previous work in this area.

2.1 Data Center and Network

Today, few workloads execute on a single computer. Clients, servers, ap-plications, and middleware may be distributed over many different nodes. Cooperation between different data centers provides even more powerful network functions and services. However, distributed computing frameworks require the network to transfer information. This section describes the concept of a data center and modern data networks, as well as the relationship between them.

2.1.1 Data Center

(21)

Because a data center is typically very large and usually has many backup systems and storage, it consumes a large amount of power. A very important factor is Power Usage Effectiveness (PUE). PUE describes the power efficiency of a data center. A lot of effort has been made to realize high PUE values, thus creating energy efficient data centers.

2.1.2 Data Center Network

The components within a data center frequently share data or functions with each other. This sharing means information transfer is very important within data centers in order to realize different applications or functions.

A data center network (DCN) plays an important role as it interconnects the components within a data center. As data centers are usually large-scale clusters containing thousands of nodes or even more, DCNs are usually very complex and are not easy to build or manage. A successful DCN architecture provides high scalability, high fault tolerance, and high-speed connectivity. There are many types of DCNs (such as three-tier DCNs, Fat tree DCNs, and DCell). These different types of DCNs aim to realize a more stable or powerful network structure.

Figure 2.1 shows the structure of a tier DCN described in [1]. A three-tier DCN consists of three layers, each with its own type of network. These networks are called access network, aggregation network, and the core network. These three networks are connected via switches. The lowest layer, the access network layer, consisted of servers and layer 2/layer 3 (L2/L3) top of rack switches. Each server in the access layer is connected directly to one of these L2/L3 tops of rack switches. The aggregation layer contains higher layer L3 switches which connect the L2/L3 tops of rack switches. The core switches in the core layer are responsible for connecting aggregation layer switches as well as connecting the data center to the Internet. Although the three-tier DCN architecture is the most common network architecture used in data centers today, it has poor scalability. Hence it cannot deal with the growing demands of cloud computing.

The fat tree DCN is an improved version of the classic three-tier architecture which also realizes a hierarchical network structure of access layer, aggregation layer, and a core layer. However, it contains more network switches and is divided into k pods. Each pod contains (k/2)*2 = k servers, k/2 access layer switches, k/2 aggregate layer switches, and k core switches. Each core switch connects to one aggregation layer switch in each of the pods. By using k core switches, the fat three DCN overcomes the oversubscribed problem by realizing a 1:1 subscription ratio. However, scalability remains a big problem for fat tree DCN.

(22)

Figure 2.1: Three Tier network architecture([1])

structure, low-level Dcells are a fully connected network consisting of servers. Higher level DCells are formed from several low-level Dcells. Instead of using high-end core switch, DCell uses a mini-switch to scale out, thus providing greater scalability. The major issue of DCell is the network latency and cross-section bandwidth. Additionally, how to direct network traffic in and between layers is another problem.

2.2 Telia Company and Strategy

Telia company is an international company with employees and customers all over the world. The company intends to develop the next generation network which could provide more powerful communication service. They also work like a hub which connect digital ecosystem, people, companies and societies together. The headquarters, which located in Stockholm, act as heart of innovation and technology.

This master thesis has a close link to Telia strategies within the GSO Network department where technology innovation matched with efficient re-source utilization and accelerated time to market. For example, based on the cloud-native requirement some sophisticated load balancing algorithm should be investigated to achieve better data flow control. Several cloud orchestration services, such as auto-scaling, have been considered for better cloud resource management. A concept called closed loop operation has been introduced which intends to realize network auto-management based on the system’s feedback.

2.3 Cloud Computing

(23)

concerned with how the underlying system manages the software and hardware resources. Data centers make cloud-based services and applications possible, as a large data center is logically a big resource pool which can provide a wide diversity of services.

Additionally, by sharing resources, it is easier for a given service to have just the correct amount of resources that it needs at a given time. Cloud computing suppliers usually provide their services via one of three types of models: Infrastructure-as-a service (IaaS), Platform as a Service (PaaS), and Software as a service (SaaS), these models offer increasing abstraction to users [8, 9]. Figure 2.2 shows these service models.

Figure 2.2: Cloud Computing Service Models

Many cloud platforms are built by leveraging an open-source software platform (such as OpenStack) to provide IaaS, thus making virtual resources (such as virtual servers) available to users. As most of the services can be treated as a workflow through a chain of network functions, the management of a cloud platform can be split into two parts: resource allocation to different functions and data flow control through the allocated resources and deployed functions.

2.4 Virtulization

A very important concept in cloud computing is virtualization. Virtualization has been used in many areas in addition to cloud computing to provide convenience and flexibility. As PUE is always a very important factor when evaluating a data center, virtualization helps to save energy as the resources of a VM can be provisioned dynamically as the workload changes; hence the workload can have just enough resources for its needs, but not more.

(24)

unused hardware. Taking NFV as an example, virtualization helps to implement a network function by using software (i.e., the network functions are no longer combined by connecting particular hardware devices). In this way, the required number of VMs with the appropriate software deployed on them can be used to realize the equivalent chain of network functions, while being able to be dynamically scaled up or down. However, as proved by Georgios P. Katsikas[10], the NFV service chain could result in performance degradation and high latency. As service chain heavily relies on CPU performance and existing NFV systems usually use multiple CPUs in parallel to realize these service chain, NFV service chains still face performance problems even though some new techniques such as fast network drivers have been used.

Another type of virtualization which has been widely used is network level virtualization. Take SDN as an example; network-level virtualization provides flexible and logically centralized management through a central controller[11].

2.5 Network Function Virtualization Framework

Network function virtualization has been widely used in cloud computing as this decouples the network functions (NFs) from particular physical infrastructure. Breaking the binding between NFs and hardware can provide a lot of flexibility in network management and improved resource utilization. That is to say, network functions and service could be achieved in a more efficient way [12].

To develop standards for NFV, the European Telecommunication Standards Institute (ETSI) has proposed a management and orchestration (MANO) framework [2]. This model is also the basic framework used by Telia company to realize network auto-management. MANO consists of three main components: NFV orchestrator (NFVO), VNF manager (VNFM), and Virtualized infrastructure manager (VIM). These components are connected to network elements through reference points. VNF and NFV infrastructure (NFVI) consists of the basic NFV architecture layer within the network. Element management (EM) and Operation system support (OSS)/Business System Support (BSS) consist of the network management system. Figure 2.3 shows this architecture.

Each model in the MANO framework has been introduced in [13] and [2]. VIM is connected to the NFVI model which contains both software and hardware resources. VIM can manage and control the resources in NFVI, usually within one operator’s infrastructure domain. MANO may contain several VIMs. Each VIM is allocated to a particular service to manage the resources for this service. A VIM could also be used to support the management of VNF forwarding graphs by creating and maintaining virtual networks.

(25)

Figure 2.3: ETSI MANO Framework([2])

VNF instances. A Virtualized Network Function Descriptor (VNFD) is a kind of template which describes the deployment and behavior of each VNF instance and can be used to create and manage VNF instances. The matching model between VNFD and VNF package is one-to-one, which means one VNFD describes only the attributes and requirements of one VNF instance. A VNFM is maintained and controlled by NFV orchestrator.

The NFVO is responsible for orchestrating multiple NFVI hardware or software resources through VIMs and managing the lifecycle of Network Services (NSs) through VNFMs. Four data repositories have been connected to NFVO: NS catalog, VNF catalog, NFV instances, and NFVI resources. These are used to store information about NSs, VNF packages, VNF instances, and NFVI resource, respectively. NFVO could make use of the information from these four data repositories to provide end-to-end services.

2.6 Machine Learning in Networking

(26)

Machine learning within the network area remains a fresh concept worth investigation. Many machine learning based applications in various key areas of networking have been introduced, compared, and evaluated in [6]. Machine learning has been extensively applied to several problems for the network, such as pattern recognition, understanding network traffic, predicting service metrics, outlier detection, and so on. Machine learning techniques have mainly been used for network operations and management

Although there is a dire need for machine learning based network man-agement and operation, it still remains a big challenge for this area[6]. The reasons lie in two aspects: Firstly, networks differ from each other, hence it is difficult to find a standard to attain uniformity across networks. Therefore, the trained model which proved to work in one network may be unsuitable for another network. Secondly, the network continues to evolve which means an application developed by using a fixed set of patterns may not be useful for network operation and management in the near future.

Several techniques such as SDN and NFV have been developed to promote the applicability of machine learning in networking. These techniques provide a new way to program network by leveraging well-developed software structure. They also leverage the concept of virtualization to realize network operation and management in a more efficient way

2.7 Related Work

Many research related to auto network management and orchestration have been done in previous work. This section presents previous work in areas of modern NFV based auto-management structure, cloud auto-scaling technique and machine learning-based network analysis.

2.7.1 Auto Network Management and Orchestration

A very important feature of MANO is auto-management with all the network functions and resources managed automatically depending upon the current network state at the cloud provider’s side without clients’ interference.

(27)

GANA.

Figure 2.4: AFI GANA Reference Model([3])

As an automatic DE, a DE can auto-discover network instances and policies or other DEs it may collaborate with. Each DE is assigned one or more Managed Entities (MEs). DEs can automatically discover the required network resource for these MEs. Based on these discoveries, a DE can perform auto-performance tuning of its assigned MEs (such as configuration, optimization, self-repair, and so on). An ME is a managed resource which can vary depending on what kinds of management the GANA requires. The ME could manage an individual network element or a complex application.

(28)

2.7.2 Auto-scaling Technique in Cloud Orchestration

The auto-scaling technique is a simple NFV based orchestration service which provides resource management. Many rule-based auto-scaling approaches have been developed. For example, the Auto-scaling service provided by AWS[16], Azure Autoscale service provided by Windows Azure[17], and Scalr[18] are widely known.

However, whether auto-scaling based resource management is useful and how much benefit it can bring have been widely debated and studied. Ming Mao and Marty Humphrey have discussed how auto-scaling can be used to reduce cost[19]. In [20], an auto-scaling framework called SmartScale was developed which brings a lot of benefits by minimizing resource usage cost. SmartScale combines vertical scaling (adding more resources to existing VM instances) and horizontal scaling (adding more VM instances) to ensure applications’ scalability. Deadline of application is another constraint that needs to be carefully considered for auto-scaling, Y. Ahn and Y. Kim [21] investigated various workflow patterns and extended a task-based auto-scaling algorithm[22] to support workflows. This auto-scaling method can detect delay and deadline violations by comparing actual finish time and estimated finish time of running tasks and adjusts the number of VMs appropriately.

Many open source cloud platforms, such as OpenStack, support various auto-scaling approaches. For the OpenStack based cloud platform, Heat orchestration [23] can provide auto-scaling service in co-operating with the ceilometer data collection service. A heat orchestration template is used to manage different OpenStack resources, such as AutoScalingGroup and ScalingPolicy resource for a Heat service and an Alarm resource for a ceilometer service. Another open-source platform called Docker[24] can also be used to develop applications on hybrid hosts and realize some auto-scaling functions. Y. Li and Y. Xia [25] designed a platform which can auto-scale web applications in a Docker-based cloud environment. A scheduling controller has been built in this project(introduced in [25]) to realize application management by combining a prediction and reaction algorithm.

2.7.3 Machine Learning Based Network Analysis and

Management

Many people consider machine learning a powerful tool to realize data flow control or to realize auto-management of a network. However, the traffic within a network changes over time which makes machine learning in this area difficult to implement.

(29)

On-line learning can be considered for streaming data and real-time analysis. In [28], a real-time analytics engine was introduced to process real-time network traffic rather than require a priori detailed knowledge of the system’s components. The project focuses on a critical part of service assurance, namely, the capability of a provider to estimate the service quality based on measurements of the provider’s infrastructure. Several machine learning based statistical models such as lasso regression, regression tree, and random forest were compared. The result shows that the random forest has the best performance. This project also designed a real-time analytics engine which performs online training and prediction which can be considered as a building block for future real-time network auto-management.

(30)

(31)

Chapter 3

Testing Environment

Establishment

This chapter describes the procedure of how we set up the testing environment. Section 3.1 gives an introduction to the OpenStack cloud platform we used in this project including a detailed description of each OpenStack service and how they cooperate with each other. Section 3.2 introduces the overall testing environment structure we designed for online resource management along with a detailed introduction of each element. Section 3.3 gives an overview of how default auto-scaling mechanism provided by OpenStack Heat Orchestration Service works. Then, Section 3.4 gives an introduction to our new auto-scaling mechanism.

3.1 Setting Up a Cloud Platform

This section starts with a brief introduction to the OpenStack platform. Then gives more detailed information about each OpenStack service and how they cooperated with each other.

3.1.1 OpenStack Platform

The OpenStack platform used in this project is based on the Mirantis Cloud Platform (MCP) maintained by Huawei. MCP includes individual VM artifacts for core services and provides a suite of open source Operations Support Systems which can help users to log, control, and monitor OpenStack services in a better way[30]. MCP uses the DriveTrain toolchain[31] to continuously delivered these services to a cloud environment.

3.1.2 Introduction of Different OpenStack Services

(32)

in more detail below:

Keystone Keystone is the OpenStack Identity Service which could provide API client authentication, service discovery, and distribute multi-tenant authoriza-tion.

Glance Glance is the OpenStack Image Service via which users create or discover VM image metadata.

This metadata can be used by another service through a RESTful API. For example, Nova can launch new instances based on the image provided by Glance.

Nova Nova is the most important services supported by OpenStack as it provides a way to create and manage a compute instance. Nova can help to create virtual machines, bare metal servers, and even provides limited support for system containers. Nova runs as a set of daemons on top of existing Linux servers.

Cinder Cinder is the OpenStack Block Storage Service which provides block backups and makes OpenStack platform fault-tolerant, recoverable, and highly available.

Neutron Neutron provides network connectivity between interface devices, such as virtual network interfaces (vNICs). It helps to maintain the network between instances and manages the routers and interface for each network. Horizon Horizon provides a web-based user interface to OpenStack services and is implemented as OpenStack’s Dashboard. Horizon provides a graphical user interface (GUI) enabling users to manage and maintain OpenStack services in a more convenient way.

Heat Heat provides OpenStack’s Orchestration service to create a human-accessible and machine-human-accessible service for managing the entire lifecycle of the cloud infrastructure and applications within OpenStack clouds. Ceilometer Ceilometer is the OpenStack data collection Service which provides a

service to transform and normalize data across all current OpenStack core components. Work is underway to support future OpenStack components. Ceilometer has meters to collect data from different resources and store these data into a Mongo Database. Ceilometer can provide data for the Heat service to realize the auto-management of stack resources.

3.1.3 Relationship Between OpenStack Services

(33)

Figure 3.1: Relationship Between Openstack Service

3.2 Testing Environment Structure

The structure of the testing environment is introduced in this section along with a detailed description of each model. We built our own testing environment based on MCP provided by Telia Company. Figure 3.2 shows the structure of this testing environment.

This project was divided into two tenants: an autoManage tenant and a Load balance tenant. Within the autoManage tenant, we created a server pool with limited network resources (such as IP address, virtual CPUs, and so on). These resources are subsequently allocated to new web server instances. The number of web server instances within the server pool is flexible and depends on the requirements needed to service the current load. Within the Load balance tenant, a load balancer and several HTTP traffic generators have been created. These HTTP traffic generators act as if they were real-world clients, while the load balancer directs incoming traffic to different web servers.

(34)

Figure 3.2: Structure of Testing Environment

realize the auto-scaling mechanism. More details will be given for each model in the following sections.

The hardware configuration for OpenStack Lab is shown in Table 3.1.

3.2.1 Network Topology

Using the testing environment structure, a network topology was created via the OpenStack Neutron service for use in this project. The resulting topology is shown in Figure 3.3 and Figure 3.4.

The number of clients in client network shown in Figure 3.3 is flexible and depends on the testing requirements. The webServer shown in Figure 3.4 in the server network represents one instance from the server pool. These instances can be automatically added or removed by applying an auto-scaling mechanism. These two tenants are connected via a shared load balancer network. The load balancer receives incoming HTTP requests from the client network and distributes these requests to web servers in a server network. This means that the actual IP address of the web servers is private and invisible to the clients, only the IP address of the load balancer is visible. Therefore, clients communicate with the load balancer via this single IP address.

(35)

Figure 3.3: Network Topology of Load Balance Tenant

(36)

Table 3.1: OpenStacke Lab Hardware configuration

Type Function Model Configuration

Server

Management Node CH222 v3

CPU: Intel CPU E5-2658A v3 x 2 Memory: 256GB ( 16GB x 16 )

Disk: 900GB 10K 2.5’ SAS Hard Disk x 6 + 1.6TB 2.5’ SSD Hard Disk Raid Card: LSI3108 Raid Card

Mezz Card: MZ310 + MZ310 ( Dual 10Gb Ethernet Ports per Mezz Card)

FusionStorage Node CH222 v3

CPU: Intel CPU E5-2658A v3 x 2 Memory: 256GB ( 16GB x 16 )

Disk: 900GB 10K 2.5’ SAS Hard Disk x 12 + 1.6TB 2.5’ SSD Hard Disk Raid Card: LSI3108 Raid Card

Mezz Card: MZ310 + MZ310 ( Dual 10Gb Ethernet Ports per Mezz Card)

Compute Nodes CH121 v3

CPU: Intel CPU E5-2658A v3 x 2 Memory: 256GB ( 16GB x 16 ) Disk: 900GB 10K 2.5’ SAS Hard Disk x 2 Raid Card: LSI3108 Raid Card

Mezz Card: MZ312 ( Four 10Gb Ethernet Ports ) + MZ710 ( Dual 40Gb Ethernet Ports )

Blade Chassis E9000

Form Factor: 12U

Embedded Switch: CX310( 16 x 10GE uplink, 32 x 10GE downlink ) x 2 + CX710 ( 8 x 40GE uplink, 16 x 40GE downlink )

Fan Module: 14 hot-swappable fan modules in N+1 redundancy mode Power Supply Module: Maximum six 3000W/2000W AC or six 2500W DC hot-swappable PSUs, N+N or N+M redundant

Stroage IP-SAN Storage OceanStor 5500 v3

Dual Controllers 128G High-speed Cache 10G iSCSI port x 8

600G 15K 2.5’ SAS Hard Disk x 25

3.2.2 Clients

Each client instance in this project simulates a user’s behavior in the real world. Each client instance is a centos OS-based virtual machine created and managed by the Nova service. Httperf [32, 33] has been used to generate various HTTP workloads for measuring web server performance. The operation of httperf can be controlled through options such as rate, max-connections, timeout, and so on. When a client is created, it sends HTTP requests to the load balancer via httperf commands using a shell script. Within this shell script, loops are created for clients to create HTTP requests and to change the request rate. The HTTP request rate is determined by a rate option and changes over time following a given load pattern [28]. These load patterns are:

Constant-load pattern A fixed number of clients will be created. These clients send requests at a constant rate.

SMOOTH-load pattern The HTTP request rate starts at an initial rate and increases by certain increment over a range after each loop. After certain loops, the increment will be a negative value, hence the HTTP request rate will decrease in the following loops and return to the initial rate.

For example, the initial rate could be 500 requests/s and the increment is 20 requests/s, then after the first loop, the HTTP requests rate will become 520 requests/s.

(37)

the HTTP requests rate will suddenly surge to a very high value and then drop back to the original value after a very short time.

The above load pattern allows us to test the web server performance for most situations. Timestamps and the HTTP requests rate at each timestamp will be output to a file for further analysis.

3.2.3 Load Balancer

Load balancers are needed to distribute incoming HTTP requests from the clients to web servers within the server pool in the autoManage tenant and to hide the private IP address of web server from the public network for security purposes. As described earlier, the load balancers used in this project are located in a shared network between two OpenStack tenants. The load balancers can communicate with the web servers and clients through their private IP addresses. The service we used to realize load balancing is HAProxy [34]. HAProxy is open source software which can be used to build a proxy server to provide a highly available load balancing service for TCP and HTTP-based applications.

The load balancer in this project is a virtualized network function which can be treated as an instance in the OpenStack Platform and this created through the Nova service. This instance is a Centos OS-based virtual machine with the HAProxy service[34] configured in it. A round-robin policy was used for traffic distribution. An IP address pool of web servers was created for the load balancer to choose from. The private IP address within this IP address pool was allocated to each web server during instance initialization. Figure 3.5 shows the backend and frontend set up for the load balancer service used in this project. The size of the web server pool is ten, hence there are ten available IP addresses for web servers to choose from.

3.2.4 HTTP Web Server

Each web server in the server pool is an individual instance running on Centos OS. An instance can be launched in two different ways, through the Heat service for a traditional auto-scaling mechanism or via the Nova service for our new online control engine. More details will be given for the two different auto-scaling mechanisms in Sections 3.3 and 3.4, respectively.

After the instance has been built, we configure some web services in it and make it a running web server. The web server program we used in this project is the Apache HyperText Transfer Protocol (HTTP) server program called httpd[35] This server handles HTTP requests through a pool of child processes or threads. It will be configured in each instance during initialization via the user data option by using the commands below:

(38)

Figure 3.5: Load Balancer Configuration 3 sudo s e t s e b o o l −P h t t p d c a n n e t w o r k c o n n e c t d b =1

We assume that the network resources in a web server pool are limited; hence the maximum number of web servers is also limited. For example, if we have an IP address pool with only ten private IP addresses in it, then we can create at most ten web servers. How many network resources should be allocated to each server pool is the question we need to consider carefully given the required service demand and the capacity of each server.

3.3 Default Auto-scaling Mechanism

(39)

3.3.1 Heat Template

To realize cloud orchestration, heat creates a component call stack by using the AWS Cloud Formation template format or native Open-Stack Heat Orchestra-tion Template (HOT) format. Both of these formats enable the developer to describe applications, services, and network infrastructure within a stack in an easy way. The template format used in this thesis is HOT, and several resource types are included in it, such as floating IP address, image type, volume, private key, security group, and user data.

Three HOT files has been used in this project to create a stack with the traditional auto-scaling mechanism. These files are centos, environment and autoScaling.

centos Describes the resource type for one instance (of a web server) in the server pool.

This file should include resource types such as OS::Nova::Server (defines instances’ name , flavor, key name, image type and user data), OS::Neutron ::Port (defines which subnet this instance belongs to), and OS::Neutron::Flo -atingIP (allocate one public floating IP address to this instance).

environment Describes the environment variable for stacks.

For example, we can define centos as a user-defined resource in the environment file based on the centos file we created earlier.

autoScaling Used to describe autoscaling mechanism for the stack.

Some Heat and Ceilometer resources will be defined in this file for the auto-scaling function. This file is the most important file for the traditional auto-scaling mechanism. More details about this file will be given in the next section.

3.3.2 AutoScaling File

Three resource types will be used in this file to realize auto-scaling mech-anism. They are OS::Heat::AutoScalingGroup, OS::Heat::ScalingPolicy and OS::Ceilometer::Alarm.

• OS::Heat::AutoScalingGroup: this resource could define a web server pool with property of cooldown period (to ensure that stack doesn’t launch or terminate additional instances before time out ), max size (maximum number of servers in the pool) and min size (minimum number of servers in the pool). The resource type for instance within server pool is OS::Nova::Server::Cirros which has already been defined in environment file.

(40)

• OS::Ceilometer::Alarm: This resource is based on ceilometer to create an alarm for Heat Scaling Policy. Ceilometer has the different type of meter like cpu util, network incoming rate and so on for different network resource. It defines meter name (which meter will be used for auto-scaling), statistic (average, maximum or minimum), period, threshold to alarm, alarm actions (defines what the system will do once the alarm be called) and comparison operator (gt means greater than threshold and lt means less than threshold).

3.3.3 How Default Auto-scaling Works

Ceilometer periodically and automatically collects data from different network resources through meters. The sampling period for data collection is defined in the ceilometer configuration file (ceilometer.conf ) and pipeline configuration file (pipeline.yaml ) and can be changed according to requirements. The data processing progress can be considered as a pipeline within Ceilometer Service. Pipelines describe a coupling between sources of data and the corresponding sinks for data transformation and publication at a configuration level[36]. Within the pipeline.yaml file, there is a parameter called interval which is set to change the sampling period.

Once data has been collected and saved into the database in the controller, a ceilometer alarm could provide Monitoring-as-a-Service for a resource running on Open-Stack by using these data. There are three states for alarms: ok, alarm, and insufficient data. The ok state will be set once the rule governing the alarm has been evaluated as false. The alarm state will be set once the rule governing the alarm has been evaluated as true. Finally, the insufficient data state will be set when there are not enough data points available during the evaluation periods to meaningfully determine the alarm state [37].

When the alarm state changes from ok to alarm, the heat orchestration service will invoke a Scale-up or Scale-down action. Then depending on the thresholds this could automatically create or delete one web server in the stack. Figure 3.6 shows the data flow.

3.4 New Auto-scaling Mechanism

In this project, we have created a new online control engine based on a machine learning model. This control engine can realizes a new auto-scaling mechanism. Instead of using the heat based stack to manage a resource, this online control engine cooperates with the Nova service to realize a server auto-scaling function. As before, the ceilometer service is used for data collection.

(41)

Figure 3.6: The Data Flow Among Different Modules for Default Auto-scaling Mechanism

of servers. Figure 3.7 shows the data flow for this new auto-scaling mechanism.

(42)

(43)

Chapter 4

Data Collection and Offline

Model Training

This chapter describes the procedure of how we collect data from the system and how we realize offline training. Section 4.1 gives an introduction to the services we used to collect network data along with a detailed analysis of these network data. Section 4.2 introduces the offline training models we used in this project along with the training processes and results.

4.1 Data Collection

As we all know a labeled dataset is very important for the machine learning training step. This section introduces how we generate network data and how we store and fetch this data depending on the desired real-time client request pattern

All the network resource data on the server side is collected through the ceilometer service. Some parameters such as sampling period should be decided upon before data collection. If the sampling time is too large, the testing period will be too long (i.e., the system will not be very responsive to changes) and the data we collect will not reflect real-time changes. On the other hand, if the sampling period is too short, there will exist some data overlap between two sampling periods. After some initial testing, we set the sampling time for ceilometer to 3 minutes in this project.

HAproxy[34] also provides a variety of statistics metrics to monitor the network and to show current states. The statistics metrics will be used in this project is showed as below.

scur number of current sessions

slim Session limitation. In this project, it has been configured to 5,000 eresp Accumulated number of response errors.

(44)

rate Number of sessions per seconds during the last elapsed second rtime The average response time in ms over the 1024 last requests ttime The average total session time in ms over the 1024 last requests These data are achieved through HAProxy’s interface by using socket commu-nication. A service call netcat[38] is used to realize this socket communication with the following comment:

1 e c h o ” show s t a t ” | nc −U / v a r / run / haproxy . s o c k

4.1.1 HTTP Request Pattern

Using the SMOOTH-load pattern introduced earlier, three clients are created to generate HTTP requests. Figure 4.1 shows the real-time patterns of their aggregated HTTP request sending rate.

Figure 4.1: The Pattern of Client Request Sending Rate (16 hours) This figure only shows the changes in client sending rate over a short period (16 hours). It starts with a very small value and smoothly increases to a maximum value and then decreases smoothly. Finally, the sending rate will return to the initial value. The whole process of increase and decrease can be considered as one loop and will be repeated over and over again. We let the system run for over five days to generate the training data. As a result, there are nearly 2400 timestamps in the data set.

4.1.2 Features Within Data Set

(45)

example, disk.device.read.bytes.rate will always remain 0 and disk.allocation disk.allocation remains a constant. As both of these types of data do not change over time they do not reflect real-time network conditions. We also removed all the cumulative data types from the data set, as it is difficult to detect real-time network changes when using cumulative data. Although we could derive the difference between cumulate values and act on these difference, the results have a similar function as the gauge values. For example, disk.device.write.bytes could be used and the difference between cumulate values (assuming the sampling period is 1 second) would give the byte rate which is the same as the disk.device.write.bytes.rate gauge.

Figure 4.2: Meters for Ceilometer

(46)

Figure 4.3: Features in Training Data Set

4.1.3 Data Pre-processing

After we have collected all the data we need, we need to pre-process this data to make it fit for our training model. This data pre-processing can be divided into two steps.

In the first step, we will compute the statistic (like average, sum, etc.) value of all the instance within a sampling period (3 minutes) as the sampling value for this period. The reason for this operation is two-fold: (1) the heat based default auto-scaling mechanism only considers one network feature (average CPU util); however, we intend to create a simulation model based on more of the network features, specifically the average value for a particular feature within the same sampling period. The statistic we chose here depends on the types of network feature we want to process. The reason why we consider average as the most important statistic is that we want to mimic what Heat orchestration service has done to these data. The average value is the most frequently used statistics for CPU utilization of single instance in Heat service and for network bytes incoming and outgoing rate, we will use sum instead of average. (2) At each timestamp, we should have only one data value for each feature as an input to our model.

(47)

Figure 4.4: Data Set Before Pre-processing

Figure 4.5: Data Set After Pre-processing • Rescaling (min-max normalization):

x0 = x − min(x) max(x) − min(x) • Mean normalization: x0 = x − average(x) max(x) − min(x) • Standardization: x0 =x − x σ

We also used KB/s instead of B/s in this project. The reason for this is that the data we collected from the network is various from 1 to thousands or even 100 thousand between features. For example, the value for Disk.device.write.requests.rate is less than 5 while the value for network.incoming.bytes.rate usually varies between 102_{and 10}5_{. This will make}

some of the coefficients quite small (the smallest value is around 10−6) which means we need to use more bits to represent it in a binary format (around 20 bits). If we only have a fixed number of bits to represent a value, we may lose accuracy for these near-zero values. For example, if we only have 16 bits to represent a value, most of the coefficients would be set to 0. Since this thesis only deals with software-based network control and all the data is in a decimal format, this problem will not occur in this project. However, we still use KB/s in this project in case of future hardware development and data transmission.

4.1.4 Database Operations

(48)

data collected by the ceilometer meters will be saved in a collection called meter within this database. The IP address (192.168.0.103) and password of the database are defined in ceilometer configuration file. We can connect to this database through an ssh tunnel by using this information.

After the tunnel has been established, we can select the data we need from this database with a MongoDB Shell Script. For example, we can retrieve cpu util by using the commands shown below:

1 db . meter . f i n d ( { c o u n t e r n a m e : ” c p u u t i l ” ,

2 p r o j e c t i d : ” 0 5 8 2 5 e b f b c 6 0 4 3 9 b 8 3 6 8 8 6 1 9 9 a f 5 9 3 e a ” } ) 3 . s o r t ( { timestamp : −1})

In the above, project id is used to select data for a particular tenant. ”05825e -bfbc60439b836886199af593ea” is the ID of the autoManage tenant, thus the CPU utilization data only comes from web servers.

A MongoDB Shell Script can also be used to process the data. We retrieve the average value for different features during each sampling period using the command shown below. The value of counter name should be changed according to the feature type whose average value we want.

1 v a r c p u u t i l=db . meter . f i n d ( { c o u n t e r n a m e : ” c p u u t i l ” , 2 p r o j e c t i d : ” 0 5 8 2 5 e b f b c 6 0 4 3 9 b 8 3 6 8 8 6 1 9 9 a f 5 9 3 e a ” } ) 3 . s o r t ( { timestamp : −1})

4 c p u u t i l . f o r E a c h ( f u n c t i o n ( doc ) { db . temp data . i n s e r t ( doc ) } ) 5 db . g e t C o l l e c t i o n ( ” temp data ” ) . a g g r e g a t e ( [ 6 { 7 $ p r o j e c t : { n e w t i m e s t a m p : { $ s u b s t r : [ ” $timestamp ” , 0 , 1 5 ] } , 8 c o u n t e r n a m e : 1 , u s e r i d : 1 , r e s o u r c e i d : 1 , 9 timestamp : 1 , c o u n t e r v o l u m e : 1 } 10 } , 11 { 12 $match : { n e w t i m e s t a m p : { $ g t : ”2018−xx−xx ”}} 13 } , 14 { 15 $group : { i d : ” $ n e w t i m e s t a m p ” , c o u n t : {$sum : 1 } , 16 avg : { $avg : ” $ c o u n t e r v o l u m e ”}} 17 } , 18 { 19 $ o u t : ” a u t o S c a l i n g ” 20 } 21 ] )

4.1.5 How Resource Data Changes Over Time

(49)

auto-scaling mechanism. Figure 4.6 to Figure 4.11 show how the pattern changes for different network resources for the SMOOTH-load pattern.

Figure 4.6: Change Pattern for Cpu util

Figure 4.7: Change Pattern for Network Incoming Rate

Figure 4.8: Change Pattern for Network Outgoing Rate

Figure 4.9: Change Pattern for Memory Usage

Figure 4.10: Change pattern of Device Write Rate

Figure 4.11: Change pattern of number of Web Servers

(50)

because the CPU usually takes some time to release unoccupied memory. This property(of delayed memory release) will be useful for linear regression and will be introduced in detail in Section 4.2.2.

4.2 Offline Model Training

Two machine learning models are considered for offline training: Neural Network and Linear Regression. The reason for choosing these two models lies in the following two aspects:

1. These two machine learning models are well developed and offer powerful functionality. Neural Networks are widely used in classification problems, while linear regression deals with problems where there is a linear relationship between parameters. We want to compare how these two types of machine learning model (classification model versus regression model) work for the auto-scaling problem.

Although the number of the web servers is an integer between the minimum and the maximum number of servers in the system, we see a somewhat linear relationship between network resource data and number of servers in the change patterns of the network resource data shown in Section 4.1.5. However, auto-scaling cannot be considered as a simple classification problem as the number of web servers in the pool may scale to an unknown value, as long as we have enough resources in the server pool. For example, if we set the maximum number of servers in the pool to 5 during the training step, for a classification method, then we will have five possible classes for output. While for linear regression we have a linear relationship between the input network data and the output. What if we change the maximum number of servers to 10 for online testing? The classification model can only provide five possible outputs unless we re-train the model, while linear regression can easily scale to 10 depending on network traffic. Methods such as round up and round down can be used to make the output an integer according to the system’s requirement. For example, if the system is resource sensitive, the round down method would be considered. Although round down method will lead to an increase in failures to be SLOs, for some systems with moderate SLOs, this would not be a big problem. Several tests should be conducted for a better choice of approximate method in order to achieve a balancer between different requirements.

(51)

4.2.1 Neural Network Structure

Figure 4.12 shows a very simple neural network structure with one input layer, one hidden layer, and one output layer. The input data set (x1, x2, ..., xn) of

this model is the X input data set we have introduced in Section 4.1.2 and Y is the output of the model and represents the number of web servers required in this system. Each neuron in the output layer represents one possible output of the model, and the value of this neuron represents the probability of this output. For example, if the value for output 1 is 0.8, this means this system has an 80% chance to have only one web server running. We take the maximum value among all the output neurons as the output(Y) of the system. The output Y can be used to evaluate the system or to generate a control signal for auto-scaling of the web servers.

Figure 4.12: Neural Network Structure

(52)

4.2.2 Hidden Layer Structure

Since the number of neurons for Input Layer and Output Layer is determined as per the previous subsection, we need to carefully consider how many neurons we need in hidden layer for better performance. Testing of different hidden layer structures was conducted along with parameter tuning, and the results will be shown in the next subsection.

4.2.3 Parameter Tuning

Before the training process, there are several parameters that need to be determined, such as the batch size and training epochs. The batch size is the number of samples that are propagated through the neural network. One training epoch is one forward pass and one backward pass of all the training examples. These parameters could affect the model’s accuracy, hence they should be considered carefully. We can use Grid Search[39] for parameter tuning. We did grid search once for each possible structure (the number of neurons in the hidden layer is 3, 4, 5, or 6) and combined this with testing for different hidden layer structures. The results are shown in Tables 4.1 to 4.4.

Table 4.1: Parameter Tuning(3 neurons in the hidden layer)

Epochs Error Batch 1 2 5 10 22 25 10 0.22260274 0.17465753 0.25684932 0.6952055 0.9589041 0.2979452 20 0.91780822 0.25684932 0.21917808 0.9589041 0.2602740 0.2602740 30 0.25000000 0.25684932 0.25684932 0.7226027 0.2294521 0.9965753 40 0.18150685 0.17465753 0.19863014 0.9178082 0.9143836 0.2568493 50 0.17465753 0.86301370 0.19863014 0.9965753 0.1986301 0.9965753 60 0.25684932 0.85273973 0.25684932 0.1815068 0.2568493 0.3835616 70 0.21917808 0.67808219 0.18150685 0.1986301 0.5171233 0.3527397 80 0.21917808 0.22945205 0.92808219 0.1986301 0.2979452 0.9143836 90 0.17465753 0.25684932 0.25684932 0.2568493 0.6541096 0.2089041 100 0.42808219 0.17465753 0.17808219 0.2568493 0.1746575 0.1986301 150 0.19863014 0.29109589 0.27397260 0.1986301 0.3801370 0.2910959 200 0.17465753 0.25684932 0.25342466 0.9178082 1.0000000 0.9965753

(53)

Table 4.2: Parameter Tuning(4 neurons in the hidden layer) Epochs Error Batch 1 2 5 10 22 25 10 0.23972603 0.18150685 0.19863014 1.0000000 0.5958904 0.8972603 20 0.99315068 0.18150685 0.26027397 0.9212329 0.2568493 0.2945205 30 0.21917808 0.17465753 0.21917808 0.2945205 0.2568493 0.2020548 40 0.18150685 0.18493151 0.35273973 0.2568493 0.4315068 0.2568493 50 0.18150685 0.18150685 0.98630137 0.1746575 0.1986301 0.2568493 60 0.25684932 0.21917808 0.32191781 0.1986301 1.0000000 0.2568493 70 0.24657534 0.17465753 0.17465753 0.8938356 0.2568493 0.9143836 80 0.92123288 0.25684932 0.91780822 0.9143836 0.1746575 0.1746575 90 0.26027397 0.29109589 0.25684932 0.1746575 0.3561644 0.2568493 100 0.19863014 0.65068493 0.17465753 0.3356164 0.9965753 0.2568493 150 0.18493151 0.26369863 0.18493151 0.2397260 0.2397260 0.2397260 200 0.26369863 0.28424658 0.26712329 0.3219178 0.3219178 0.2397260

Table 4.3: Parameter Tuning(5 neurons in the hidden layer)

Cloud Auto-Scaling Control Engine Based on Machine Learning

IN

DEGREE PROJECT

,

SECOND CYCLE, 30 CREDITS

,

STOCKHOLM SWEDEN 2018

Cloud Auto-Scaling Control

Engine Based on Machine

Learning

Cloud Auto-Scaling Control

Engine Based on Machine

Learning

Yantian You

Master Thesis

Examiner

Gerald Q. Maguire Jr

Academic adviser

Anders V¨

astberg

Industrial adviser

Toni Satola

Roberto Muggianu

KTH Royal Institute of Technology

School of Electrical Engineering and Computer Science(EECS)

Department of Communication Systems

Abstract

Sammanfattning

Acknowledgment

Contents

List of Figures

List of Tables

List of acronyms and

abbreviations

Chapter 1

Introduction

1.1

Motivation

1.2

Problem

1.3

Purpose

1.4

Goals

1.5

Delimitations

1.6

Methodology

1.7

Outline

Chapter 2

Background

2.1

Data Center and Network

2.1.1

Data Center

2.1.2

Data Center Network

2.2

Telia Company and Strategy

2.3

Cloud Computing

2.4

Virtulization

2.5

Network Function Virtualization Framework

2.6

Machine Learning in Networking

2.7

Related Work

2.7.1

Auto Network Management and Orchestration

2.7.2

Auto-scaling Technique in Cloud Orchestration

2.7.3

Machine Learning Based Network Analysis and

Management

Chapter 3

Testing Environment

Establishment