Cloud Optimization for Hot Rolling

(1)

UPTEC F 14034

Examensarbete 30 hp September 2014

Cloud Optimization for Hot Rolling

Mikael Sunde

(2)

Teknisk- naturvetenskaplig fakultet UTH-enheten

Besöksadress:

Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0

Postadress:

Box 536 751 21 Uppsala

Telefon:

018 – 471 30 03

Telefax:

018 – 471 30 00

Hemsida:

http://www.teknat.uu.se/student

Abstract

Cloud Optimization for Hot Rolling

Mikael Sunde

Cloud computation is an emerging technology where a user can make use of remote computational resources of a service provider instead of using local clusters. Amazon is currently in the process of developing and launching their cloud service Amazon Elastic Compute Cloud (EC2). This report presents data on the performance of parallel global optimization methods in Matlab, as applied to a partial hot rolling mill simulation by investigating theoretical models and benchmarking actual simulation models. The computations are done using both a local cluster and on Amazon EC2 showing that for the off-line optimization of a hot rolling mill simulation it is computationally as feasible to use Amazon EC2 as it is to use a local cluster. Some upper bounds on expected speedup for various optimization methods are shown, which indicates severe limitations in the parallel efficiency of Matlab's GlobalSearch and identifies issues with the parallelization of MultiStart and patternsearch.

Examinator: Tomas Nyberg Ämnesgranskare: Maya Neytcheva Handledare: Kateryna Mishchenko

(3)

Sammanfattning

Optimering är ett ungt omr˚ade inom matematiken, c:a 70 ˚ar gammalt, som används för att hitta s˚a bra lösningar som möjligt p˚a givna problem. Detta kan vara av stort intresse för exempelvis produktionsföretag; om produktionsprocessen kan beskrivas som en matematisk modell s˚a är det teoretiskt sett möjligt att använda sig av optimering för att möjligen hitta bättre produktionsmetoder eller styra tillverkningen p˚a ett sätt som minskar kostnaderna. S˚adan beräkning kräver dock oftast kraftiga datorkluster eller superdatorer, n˚agot som inte alla företag har r˚ad att införskaffa.

I takt med att internet har utvecklats som globals kommunikationsmedium s˚a har en hel del tjänster som levereras via internet växt fram och nu finns det även möjlighet att hyra tillg˚ang till datorkluster. Detta innebär att plötsligt finns det ett mycket större omf˚ang av potentiella användare av beräkningskluster, b˚ade i form av de som tidigare inte hade r˚ad att köpa in h˚ardvara, men även de som inte hade möjlighet att h˚alla h˚ardvara p˚a sin arbetsplats.

Amazon Elastic Compute Cloud är ett exempel p˚a en s˚adan molntjänst, som är en del av av Amazon Web Services. Tillsammans med MathWorks, som tillverkar beräkningsplattformen MATLAB, h˚aller de p˚a att integrera de tv˚a tjänsterna s˚a att en användare enkelt kan hyra ett kluster av önskad storlek via ett webbinterface och snabbt koppla upp sig till det i MATLAB.

För företag som utvecklar mjukvara innebär detta samarbete en chans att ha utveckling i en och samma miljö, MATLAB, men med möjligheten att skala beräkningarna fr˚an seriellt (för att f˚a saker att fungera) till lokalt parallelt p˚a n˚agra f˚a kärnor (för att f˚a parallela beräkningar att fungera) till storskaliga kluster (för kunder som behöver den beräkningskraften).

I den här rapporten visas hur vissa av de optimeringsmetoder som finns med i MATLAB fungerar, hur de p˚averkas av att köras p˚a m˚anga kärnor b˚ade lokalt och i molnet, och allt detta appliceras p˚a ett reellt problem i form av en simulation av ett varmvalsningsverk där pro- duktegenskaper som pl˚attjocklek, produktionstid och krafter p˚a valsarna beräknas och s˚aledes teoretiskt sett kan optimeras.

(4)

List of Tables

4.1 RM dimensions . . . . 16

4.2 FinalPassCalc time fraction . . . . 16

4.3 patternsearch FinalPassCalc evaluations with persistence . . . . 19

4.4 Initial point generation effects . . . . 23

4.5 Speedup limitations of variation in objective evaluation time . . . . 23

4.6 fmincon ’TolX’ sweep - accuracy . . . . 26

5.1 Optimization benchmarks - time to optimum . . . . 39

A.1 RM constraints . . . . 49

A.2 RM constraints; adjusted drafts, thickness and forces . . . . 50

B.1 Specifications of the hardwares used. . . . 50

C.1 Cloud ports . . . . 58

C.2 MATLAB parallel methods overview . . . . 60

List of Figures

3.1 Schematic of rolls . . . . 2

3.2 Hot rolling mill . . . . 3

3.3 Rolling stand and pass notation . . . . 4

3.4 Strip profile . . . . 5

3.5 Roll deflection . . . . 6

3.6 Benchmarking \ operator . . . . 10

4.1 Serial parts of optimization algorithms . . . . 18

4.2 Maximum speedup of UseParallel . . . . 19

4.3 Effects of persistent variables on patternsearch . . . . 20

4.4 Mex-file evaluation time . . . . 24

4.5 Multiple starting points - time factor . . . . 25

4.6 Multiple starting points - time factor . . . . 25

4.7 Multiple starting points - speedup . . . . 27

(8)

4.8 parfor fmincon weak scaling . . . . 27

4.9 ’active-set’ exit flags . . . . 28

4.10 fmincon ’TolX’ sweep - time . . . . 28

4.11 Discontinuities - isolated . . . . 30

4.12 Discontinuities - chaotic . . . . 30

4.13 Drafts only optimization - time . . . . 31

4.14 Drafts only optimization - accuracy . . . . 31

4.15 GA speedup . . . . 33

4.16 GA objective value . . . . 33

5.1 Memory throughput and parfor latency . . . . 35

5.2 Inter-worker data transfer throughput . . . . 36

5.3 Client-worker vs. batch-worker communication . . . . 36

5.4 Performance benchmarks - relative speed . . . . 38

5.5 Performance benchmarks - relative speed per worker . . . . 38

5.6 Benchmark optimization problems . . . . 39

5.7 Optimization benchmarks - MultiStart speedup . . . . 40

5.8 Optimization benchmarks - objective weight fmincon . . . . 40

5.9 Optimization benchmarks - objective weight GlobalSearch . . . . 41

5.10 Optimization benchmarks - objective weight patternsearch . . . . 41

5.11 Optimization benchmarks - SSF minimization . . . . 42

6.1 Hot-rolling naive vs. optimized results . . . . 44

6.2 Hot-rolling optimized results . . . . 44

C.1 AWS menu . . . . 52

C.2 AWS access key interface . . . . 52

C.3 Cloud Center login . . . . 53

C.4 Cloud Center AWS credentials . . . . 54

C.5 Cloud Center menu . . . . 54

C.6 Cloud Center cluster creation . . . . 55

(9)

C.7 Cloud Center cluster details . . . . 56

C.8 Cloud Center edit cluster . . . . 57

C.9 MATLAB parallel menu . . . . 58

C.10 MATLAB cluster profile manager toolstrip . . . . 59

C.11 MATLAB discover EC2 clusters . . . . 59

C.12 matlabpool communication schema . . . . 61

C.13 batch communication schema . . . . 62

C.14 createCommunicatingJob(...,’Type’,’pool’()) communication schema . . . 62

C.15 createCommunicatingJob(...,’Type’,’spmd’()) communication schema . . . 63

C.16 createJob communication schema . . . . 63

C.17 Cloud Center cluster details . . . . 65

C.18 WinSCP interface . . . . 66

(10)

1 Introduction

Rolling is a process through which metal slabs are pressed into metal sheets by passing the slab through pairs of rolls. It is a time and energy consuming process and the quality of the final product varies as the equipment suffers wear and tear; it is not uncommon that large parts of the final product must be scrapped.

It’s currently common that the adjustments made to the mills are done by referencing tabulated data. By instead making use of a mathematical model that simulates the entire mill from room temperature to room temperature, it’s theoretically possible to predict the quality of the final product to a higher degree of accuracy before production; this also makes it possible to apply mathematical optimization to search through rolling schedules to find an optimal choice, with respect to some desirable quantity. This could be any property that is modelled in the simulation, such as energy consumption, roll forces or dimensions of the final product.

While local optimization of such a model can be done relatively quickly, global optimization is much slower. For instance, to reach the ultimate goal of optimization during production (on- line optimization) the computing time must be reduced by several orders of magnitude. Using parallel computing to leverage multiple computational cores may contribute to speeding up the process, but it also requires access to parallel resources, such as a computer cluster. By using cloud services it’s possible to have access to such clusters with only minimal infrastructure, allowing almost anyone to provision large clusters when they need them.

2 Notation and Abbreviation

function Formatting used for MATLAB functions parfor MATLAB parallel for-loop

spmd MATLAB single-program-multiple-data statement Amazon EC2 Amazon Elastic Compute Cloud

Amazon S3 Amazon Simple Storage Service

AMI Amazon Machine Image

AWS Amazon Web Services

draft Absolute measurement of decrease in thickness

GA Genetic Algorithms

GADS MATLAB Global Optimization Toolbox

GO Global Optimization

HPCC High Performance Computing Challenge IaaS Infrastructure as a Service

KKT Karush-Kuhn-Tucker

(11)

MS Multistart

PaaS Platform as a Service

PCT MATLAB Parallel Computing Toolbox

PS Pattern search

reduction Relative measurement of decrease in thickness

SA Simulated annealing

SaaS Software as a Service

scaling, fixed size Scaling the number of cores while keeping the total work constant scaling, weak Scaling the number of cores while keeping the work per core constant SQP Sequential quadratic programming

SSF Sum of squared forces XaaS Everything as a Service

3 Theory

3.1 Hot rolling

Rolling is an industrial process which can be used to create metal sheets. The basic idea is to pass metal slabs through one or more pairs of rolls (Fig. 3.1), each reducing the thickness of the metal slab by some determined amount until the metal is at the required thickness.

roll

roll metal

Figure 3.1: The rolls compress the metal so that it is thinner after the pass. Some elastic recovery may occur, meaning that the outgoing product may be thicker than the gap between the rolls.

Rolling is generally categorized into two types: hot and cold rolling, where hot rolling is done with the metal heated above its recrystallization temperature, usually with a margin of at least 50 ^◦C at the end of the process, and cold rolling is done below the recrystallization temperature [11]. The recrystallization temperature is the temperature limit above which the grain structure of the metal can be reformed with new grains, modifying the microstructure of the metal. Note that as the recrystallization temperature can be several hundred degrees Celsius [34], cold rolling is not necessarily done at room temperature.

(12)

For hot rolling, however, heating is required. Therefore the hot rolling process generally includes at least four stages: furnace, mills, cooling and coiling. These are used, in order, to 1) elevate the metal’s temperature, 2) shape the metal to the desired properties, 3) lower the temperature to prevent plastic deformation and 4) coil up the metal to prepare it for shipping (see Fig. 3.2).

a)

b)

c)

d)

e)

slab transfer bar

strip

Figure 3.2: a) The furnace heats up the metal above the recrystallization temperature. b) The roughing mill is the first of two mills; it compresses a slab several decimeters in thickness to a transfer bar only a few centimeters thick. c) The finishing mill compresses the transfer bar to a strip with the final dimensions. d) The runout table cools the metal actively, usually by spraying it with some liquid. e) The coiler takes the finished metal sheet, which is now about 50 times as long as the initial slab, and coils it up.

Several authors have gone into great depth to describe the process, the possibilities and the problems of rolling in general and hot rolling in particular [9, 11, 18, 19]e.g. and it is out of the scope of this article to present the entirety of the subject. However, a brief introduction to some of the basic issues that are relevant is presented below.

A rolling mill may refer both to the entire production facility and a set of rolls in the production line. For the specific mill (production facility) configuration used for the simulation models in this article, there are two mills (sets of rolls): first a roughing mill, then a reversible finishing mill (also known as a Steckel mill).

Each mill consists of one or more stands (with a mill consisting of more than one stand being called a tandem mill), where each stand has at least two rolls. In a regular mill, the metal is passed through each stand once, while in a reversible mill the metal may be stopped and reversed so that it is passed through the same mill multiple times. Note that for most reversible mills this requires the number of passes to be odd, as the final transportation direction must be the same as the initial transportation direction [19].

The metal going out of the furnace is referred to as a slab. When it has passed through the roughing mill, it is called a transfer bar, while after the finishing mill it is a strip (see Fig. 3.2).

In each pass, the physical properties, such as the thickness, velocity and temperature of the metal, may change. Following the variable naming convention in [19] the variables denoting the parameters of the metal going into a stand for pass number i will be denoted by capital letters

(13)

and indexed with i, while variables of outgoing parameters will instead use lower case letters (see Fig. 3.3).

H_i h_i

F_i

Fi

V_i Θ_i

ωi

v_i θ_i

ω_i

Figure 3.3: There are numerous important physical quantities during each pass of hot rolling, such as the thickness (H, h), roll angular speed (ω), temperature (Θ, θ and velocity (V, v).

As the thickness is significant for the rolling process, it is important to be able to talk about the modification in thickness each pass through a stand does. Using Hi as the ingoing thickness and h_i as the outgoing thickness, the draft is defined as d_i = H_i− h_i; that is, the absolute decrease in thickness. The corresponding relative change r_i = 1 −_H^hⁱ

i is called the reduction [19].

The reduction in each pass is achieved by applying forces to the rolls so that they compress the metal between them. This, in turn, deforms the rolls which may give rise to variations of the thickness of the metal over its cross section, called the profile (for an example, see Fig.

3.4). When rolling metal sheets, one of the challenges is to get the finished product to have a sufficiently even profile.

Finally, flatness is a measure of how flat the sheet can lie on a level surface without applying any external loads. There are numerous reasons as to why this can happen, such as the sides of the metal being rolled thinner, and thus longer, than the middle due to roll deflection (see Fig. 3.5) leading to buckling at the edges of the strip.

3.2 Optimization

Optimization is a mathematical process used to find extreme values to an objective function f (x), subject to some constraints on which coordinates x are acceptable. Without loss of generality, such an optimization problem can be defined as a minimization problem

x∈Rminⁿf (x),

s.t. g_i(x) = 0, i = 1, . . . , m, (3.1) hj(x) ≤ 0, i = 1, . . . , n,

where g_i are equality constraints and h_j are inequality constraints. These constraints may be linear (i.e. on the form Ax ≤ b or Ax = b) or nonlinear. A special case of linear inequality

(14)

Figure 3.4: The deviation from the mean thickness depending on the cross-sectional position of the strip is called the profile.

constraints are bounds, which set a constant limit on a coordinate. Definitions of a few common optimization related terms follow below.

Definition 3.1. A point x is a feasible point for (3.1) if all constraints are satisfied in x.

Definition 3.2. The set of all feasible points is called the feasible set or the feasible region and is denoted D.

Definition 3.3. A feasible point x^∗ is called a local optimizer if there exists an open ball R around x^∗ such that ∀x ∈ D ∩ R : f (x^∗) ≤ f (x). For unambiguous referencing of the coordinate x^∗, the value f (x^∗) and the complete set of information (x^∗, f (x^∗)), call f (x^∗) a locally optimal objective value and (x^∗, f (x^∗)) a local optimum.

Definition 3.4. In analogue with Def. 3.3, a local optimizer x^∗ is also a global optimizer if

∀x ∈ D : f (x^∗) ≤ f (x). The globally optimal objective value and the global optimum are also defined in analogue with the local versions.

Optimization problems can be classified depending on the qualities of the objective function f and the used constraints g_i, h_j. The simplest constraint case is when g_i = h_j = ∅, called an unconstrained optimization problem. When at least one of the set of constraints is not empty it is instead called a constrained optimization problem; a common variant of this is the linear programming case where f is linear and where g_i and h_j include only finitely many linear constraints. Linear programming has been studied in depth and there are several methods readily available for such problems, such as the simplex method [10].

(15)

a) b)

Figure 3.5: a) Parallel rolls when not in contact with metal. b) Originally parallel rolls are deflected during rolling due to the forces applied from the metal on the rolls.

Should f or any of the constraints not be linear, we obtain a nonlinear optimization problem; if they are at least continuously differentiable the Karush-Kuhn-Tucker (KKT) conditions shown in (3.2), where µ and λ are real coefficients, can be used as first order necessary conditions for minima [22, 23].

∇f (x) −

m

X

i=1

λi∇g_i(x) −

n

X

j=1

µj∇h_j(x) = 0 g_i(x) = 0, i = 1, . . . , m

h_j(x) ≤ 0, j = 1, . . . , n (3.2) µ_j ≤ 0, j = 1, . . . , n

µjhj(x) = 0, j = 1, . . . , n

If continuous differentiability cannot be assumed, then there are still established methods for optimization, such as penalty methods and augmented Lagrangian methods for differentiable f and subgradient methods or trust region methods for nondifferentiable f [33]. While these are more complex than the linear methods they are nevertheless well-founded mathematically.

Once the objective function f cannot be assumed to be continuous, however, the problem becomes much more complex, especially when the notion of numerical optimization, as opposed to analytical optimization, comes into the picture.

3.2.1 Symbolic vs. numerical optimization

Consider, for a moment, the unconstrained optimization problem of a continuous differentiable function f ,

minx∈Rf (x) = x² (3.3)

where it is clear that there is a single minimizer x^∗ = 0 with f (x^∗) = 0. The KKT-conditions (3.2) reduce to ∇f = 0 which can be solved symbolically to x = 0. By differentiating twice one finds ∇²f (0) = 2 which indicates that f is at least locally convex and that x^∗ = 0 is a minimum.

This is symbolic optimization: the optimum is found by working on the symbolic representation of f . Numerical optimization, on the other hand, cannot manipulate the symbolic expressions

(16)

of the problem; for instance, a symbolic expression for the differential would not be accessible during numerical optimization, but a numerical approximation of the same would be; of course, this comes with the limitation that only information from the evaluated points are available.

The example above may, for instance, be solved by using finite differences to approximate the differential, e.g. the central finite difference

∇f ≈ 1

2δ(f (x + δ) − f (x − δ))

to find a direction in which f (x) is being reduced, assuming small enough δ. One or more points at some distance in the found direction can then be evaluated to find the best choice. These steps can be repeated until the found point is close enough to optimal, by stopping once the residual r of the first equation in (3.2), i.e. ∇f = r, is small enough, after which optimality can be checked by estimating ∇²f to establish if the final point is a minimum or maximum (this condition is modified for constrained problems). For simple problems such as (3.3) points close to the minimum are easy to find.

When it comes to discontinuous f , however, it is possible to show that it is not in general possible for numerical optimization to guarantee that a true optimizer is found, much less that any optimizer found is also a global optimizer. Consider, for instance, the unconstrained optimization problem

minx∈Rf (x) =

(|x| , if x 6= δ.

−1, if x = δ. (3.4)

where R 3 δ 6= 0 for which it is easy to see that the optimizer is x^∗ = δ, with an optimal objective value of f (x^∗) = −1. Symbolic optimization can possibly handle this by treating the two cases separately, but pure numerical optimization only has access to information about the objective value in the points it has evaluated; as long as x = δ isn’t evaluated, there will be no information available that distinguishes the given function f (x) from the continuous function f^∗(x) = |x| and thus indicating that x = δ is a minimum. This means that as long as the number of points evaluated is finite, the probability in general that x = δ has been evaluated before the termination of the program is 0.

3.2.2 Measure of discontinuity

In the context of numerical optimization the measure of discontinuity of a function is significant.

The example in (??) uses a point-discontinuity, displacing a single point from an otherwise continuous but nondifferentiable function. This discontinuity is present only in a set S with a Lebesgue measure λ(S) = 0. If each evaluation of the function is a uniformly distributed independent random trial with respect to which coordinate x ∈ D is being tested, where D is the feasible region, then the Lebesgue measure implies that the probability of finding this region of discontinuity is _λ(D)^λ(S) = 0 for each trial.

It is, however, apparent that assuming uniformly distributed independent random trials cannot be done in general; it’s only completely valid when using pure random search, which samples points at uniform random in the feasible domain (or a superset thereof) and returns the best point it finds [14]. Still, if certain minima are ignored, it is possible to talk about the accuracy of two-phase methods which use uniform random sampling to generate initial points for some local optimization method. Two-phase methods use a global and a local phase in an attempt to find the global minimum; see [14].

(17)

Defining the basin of attraction of a point x^∗ as the set of points x for which the numerical optimization converges to x^∗, it is reasonable not to expect any numerical method to find minima that are associated with a basin of attraction with a Lebesgue measure of 0. Denote the smallest minimum with a positive Lebesgue measure basin of attraction as the global Lebesgue minimum. Assume then that there exist a number of local minima x^∗_i with corresponding basins of attraction of Lebesgue measures λi such that infiλi = λmin> 0; that is, the Lebesgue measure of all basins of attraction is greater than or equal to some possibly small but finite value λ_min. The probability p_GLM that a point has been selected in the basin of attraction of the global Lebesgue minimum is then bounded below by pGLM ≥ 1 − (1 − ^λ_λ(D)^min)^N where N is the number of sampled initial points.

Thus, for sufficiently well-behaved discontinuous problems, such as those that can be subdivided into a finite number of problems with at least continuous objective functions, certain methods for continuous or even continuously differentiable optimization problems may be applied for global optimization, if they are built into a two-phase method where the global phase consists of random sampling covering the entire feasible region.

3.3 Optimization methods and parallelism in MATLAB

MATLAB offers a set of optimization methods through their two toolboxes Optimization Tool- box and Global Optimization Toolbox (GADS), providing local and global optimization methods respectively. When combined with the Parallel Computing Toolbox (PCT) and possibly MATLAB Distributed Computing Server (MDCS), some of these optimization methods have predefined parallel implementations. A selection of the optimization methods will be described in section 3.3.2 and 3.3.3; for information on the other solvers, refer to [26] for the local solvers or [25] for the global solvers. Before that we’ll discuss the possibilities of parallel computations in MATLAB. For information on the toolboxes, see their respective MATLAB documenta- tion [24–27].

3.3.1 MATLAB parallelism

MATLAB parallelism is based around the concept of workers. Each MATLAB worker is a separate instance of MATLAB which has its own workspace and can make use of a single core for computations. These workers may be accessed through matlabpool, allowing use of commands like parfor, spmd or batch, or a task-centric approach can be made using commands such as createJob and createTask.

parfor is an implementation of a parallel for-loop and it requires a pool of workers available.

MATLAB uses a semi-dynamic scheduling to distribute the evaluations of the for-loop to different workers in an attempt to load-balance while reducing communication overhead. Each iteration of the loop must be independent of any of the other iterations as the execution is not guaranteed to be in order. All data required will be communicated in each iteration, but care must be taken when using the loop-variable for indexing as improper syntax will lead to un- neccessary communication. Note that while it is possible syntactically to nest multiple parfor loops, only the iterations of the outermost loop will actually be distributed amongst the workers. When using parfor, data generated on the workers during the loop will not be accessible after the loop is completed; to retrieve the data it must be stored to a variable generated on the interactive session. This is the method used for the built in parallelism in the optimization methods.

(18)

spmd, short for Single-Program-Multiple-Data, differs from parfor in that instead of having a variable amount of iterations to distribute, each worker in the current pool will execute the same code. This can be useful for weak scaling (i.e. modifying the number of cores but keeping the work per core constant) or creating distributed matrices too large to fit on one workspace.

MATLAB also supports what they call variant arrays, which have different values on different workers in an spmd-block; in MATLAB they are the simplest way of realizing the multiple-data portion of spmd. Like parfor, multiple levels of spmd can be nested but only the topmost level will be used.

batch is a MATLAB function and can be used to offload the execution of a script to a worker instead of running it in the same MATLAB process that is used for the interactive session. It is possible to make use of parallel commands, such as parfor and spmd inside scripts run by batch.

Similar to using batch to evaluate code, MATLAB supports task-based parallelism using sched- ulers. This is done by assigning one or more tasks to a job and submitting the jobs to a scheduler.

The scheduler, in turn, handles the access to the workers and will distribute the jobs to workers so that each job gets dedicated access to the necessary number of workers. Unlike parfor, spmd and batch multiple commands are required to set this up. First, an existing scheduler can be found using findResource, after which jobs can be created by createJob for serial jobs or createParallelJob for parallel jobs. Jobs are populated by tasks using createTask; once all tasks are created, the job is submitted using submit. As the evaluation is asynchronous, waitForState can be used to hold further execution until the supplied job is completed, at which point getAllOutputArguments retrieves the data from the workers.

As mentioned in Section 3.3, there are two toolboxes for MATLAB that are used for explicit parallel computations: PCT and MDCS. PCT is required on the computer that the interactive MATLAB session is running at (called the client). It allows the use of up to 12 local workers (as of MATLAB R2011b), where a local worker uses resources local to the client. To use remote resources (such as accessing a cluster from outside of it) or to use more than 12 workers MDCS licenses are required and the MDCS toolbox should be installed on the computers which will be used for workers. Either option can be used in conjunction with matlabpool to open access to a number of workers for parfor or spmd, and they may also be used to designate a scheduler and define jobs.

Finally, MATLAB also has some multicore support separate from the use of MATLAB workers.

Certain functions or operators have been implemented so that they can make use of multiple cores on a local computer; one such example is the backslash operator, \, which is used to solve systems of linear equations or equivalent problems i.e. for a relation Ax = b with known matrix A and column vector b, x=A\b [28].

Fig. 3.6 shows the amount of time it takes to perform the operation A\b on a computer with 8 cores, with various number of workers available in the pool. In such a case it is evident both that the native multicore support is independent of the worker pool size and that it is faster to solve the system using the native multicore support than to use distributed data inside spmd blocks. The only potential benefit in using distributed data is then on remote clusters when working on matrices that are too large to fit into the memory of a single worker, or when one can use sufficiently more remote computing resources than there are local resources. By generating part of the matrix A locally on each worker and constructing it as a codistributed array, the whole matrix is not stored on any one worker while it can still be operated on as if it was a

(19)

regular matrix.

Figure 3.6: The native multicore support for the \ operator is faster than working with distributed data when only local resources are available.

3.3.2 MATLAB local optimization methods

The optimization toolbox for MATLAB comes with a host of different optimization methods;

which ones are applicable to a given problem depends on the constraints and the objective function. Only a small subset of these methods will be discussed here, as this paper is not meant to be an overview of all the optimization methods in MATLAB. For more information on the other local optimization methods and algorithms refer to [26].

As the hot-rolling problem, which is discussed more in Section 4, is possibly nonlinear or even discontinuous and has nonlinear constraints, the only fitting local solver is fmincon and as such it is the only of MATLAB’s local optimization methods which will be discussed here.

fmincon is a collection of a few different optimization algorithms: active-set, interior-point, sqp (sequential quadratic programming) and trust-region-reflective.

The latter of these requires the gradient ∇f to be given as a function, which is not some- thing the hot-rolling simulation provides, and thus it will not be discussed in this paper.

All the algorithms that fmincon uses are gradient based; this means that the either directly calculate or somehow approximate gradients or Hessians (first and second order derivatives), such as the gradient ∇f , in order to perform the optimization. active-set and sqp are highly similar, consisting of three parts: approximating (or calculating) the Hessian of the Lagrangian;

using the Hessian to create a quadratic programming subproblem which is solved to give a search direction; performing a line search in the search direction to obtain a sufficient decrease

(20)

of a merit function.

The Lagrangian L for optimization problem (3.1) is L(x, λ) = f (x) +

m

X

i=1

λ_ig_i(x) +

n

X

j=1

µ_jh_j(x) (3.5)

where any nonlinear constraints have been linearized. Quadratic programming is optimization of a quadratic function

q(d) = 1

2d^THd + c^Td (3.6)

By setting H as the Hessian and c as the gradient, q(d) is a quadratic estimation of the change of the objective value introduced by modifying the current coordinates by d. The merit function penalizes points that do not fulfill the constraints, so that it is possible for infeasible points to be intermediary points if their objective value is sufficiently much better than the feasible points’ objective value [26].

3.3.3 MATLAB global optimization methods

MATLAB has five global optimization methods which can be used under various circumstances.

The following sections will describe the algorithm in general and MATLAB’s implementation in particular.

3.3.4 MultiStart

MultiStart is MATLAB’s version of multi start (MS), a two-phase global optimization method which samples starting points stochastically and runs some local deterministic descent algorithm to find a local minimum corresponding to the starting point [14]. MATLAB’s implementation uses fmincon for the local descent and allows either calling the local solvers in parallel or using the parallelism in fmincon (distributed function evaluations).

3.3.5 Global Search

GlobalSearch is similar to MultiStart in that it generates a stochastic set of initial points and then runs a local solver, fmincon, on that set. The difference between the two is that GlobalSearch filters out initial points, without running the local optimization solver on them, if they are estimated to provide no improvement or require too much work to give an improvement.

This estimation is done by assuming that initial points which are sufficiently close to each other will converge to the same minimum as well as requiring that the objective value of the initial point is not too much larger than the best result so far.

Due to this ongoing filtering, GlobalSearch does not have a parallel implementation of its own, but it can still make use of the parallelism in FMINCON. For more detailed information, see [25].

3.3.6 Simulated annealing

Simulated annealing, also called basin hopping, is a derivative-free optimization method which simulated the physical process of annealing. A single initial point is specified and the opti-

(21)

mization keeps track of a virtual temperature. In each iteration, a new point is generated stochastically from the previous point, where a lower temperature implies that the newly generated point will be closer to the previous point.

The new point is accepted (and replaces the previous point) unconditionally if it has a lower objective value than the previous point. Should the value instead be higher, it may still be accepted, with a probability based on the temperature and objective value difference; a higher temperature means that the point is more likely to be accepted, while a higher objective value difference makes it less likely to be accepted.

MATLAB implements simulated annealing using simulannealbnd which can only accept bounds constraints. Furthermore, the implementation is fully serial and cannot be parallelized without modifying the algorithm (or manually running multiple instances of simulated annealing in parallel).

3.3.7 patternsearch

patternsearch is MATLAB’s implementation of a direct search method, meaning that it does not use derivatives. Starting from a single initial point, a set of points is generated in each iteration based on the current point and a set of point offsets. These points are then evaluated and a new point is accepted for the next iteration if it has a lower objective value than the current point. Based on whether or not a new point is accepted, the pattern offsets are scaled either to cover a larger or smaller area.

MATLAB’s implementation allows parallelism by distributing the evaluation of the points generated from the offset.

3.3.8 Genetic algorithms

ga is MATLAB’s implementation of genetic algorithms, which draws its inspiration from natural selection. From an initial set of points (the population) a reproductive pressure is applied by making points with better objective value more likely to be selected to generate new points (children), which make up the next generation of the population.

MATLAB’s implementation of genetic algorithms has support for parallel computations by distributing the evaluation of the objective value of the children.

3.4 Cloud computing 3.4.1 Cloud definition

It is not entirely easy to describe what exactly defines a cloud and what sets it apart from other computing paradigms such as grid computing, cluster computing or supercomputing.

Several authors have made an effort in trying to classify clouds and introduce a taxonomy for them [6–8, 12, 15, 20, 21, 32] and there are several key concepts that are mentioned throughout most such works.

The most important concept, which is sometimes left almost implicit in the definitions of the cloud, is that it should be accessible at any time, from any place; realistically this requires some form of communication of information, which is realized using the Internet. Then, similar to

(22)

how one can use the electricity (a resource) in the electric grid (a resource infrastructure) as long as one has access to it, one could theoretically use the computing resources in the cloud as long as one has access to the infrastructure that is the Internet. This coincides well with the notion of utility computing voiced by John McCarthy in 1961, saying ”computing may someday be organized as a public utility just as the telephone system is a public utility”, noting that

”the computer utility could become the basis of a new and important industry” [16].

Following close to the concept of always-accessible is that cloud computing is provided as a service, not goods. An end user of cloud computing should not have to care about setting up or configuring the hardware that provides the underlying computing resources. While this is certainly doable in several ways cloud computing, almost universally, makes use of virtualization, which allows it not only to add a layer of abstraction that the end user cannot see past in the form of a virtual machine, but also to partition the resources dynamically so that multiple users can share hardware without ever realizing it as well as easily scale the resources provided to any user as long as there is unused hardware remaining. A quick introduction to virtualization can be found in [8], while more thorough material is presented in [35].

As the type of computing service provided varies greatly, a common classification of a cloud is to say what it is it provides as a service. In cloud computing taxonomy this has become well-established as XaaS, ”Everything as a Service”. Several categories have been proposed, but there are three which are the most commonly used: Infrastructure as a Service (IaaS), Software as a Service (SaaS) and Platform as a Service (PaaS). IaaS is the most fundamental of the three; an IaaS cloud provides basic computing resources, such as storage, databases and processor cycles. Examples of IaaS clouds include Amazon Elastic Compute Cloud¹and Google Compute Engine². From the pure computing aspect, such as for high performance computing, IaaS clouds are generally what is meant when one simple mentions cloud computing.

On the other end of the spectrum is SaaS, such as Google Drive³. The goal of SaaS is to provide complete software and application solutions that users need. This can range from basic word processing, spreadsheet management and email services to more specialized software. The third category, PaaS, is somewhere in between the two others and focuses more on servicing developers by providing a computing platform on which the users can develop and deploy their own software.

Finally, for commercial clouds, an important aspect, in combination with the opportunity for scaling, is that the cost to use the cloud should be proportional to how much resources you use, again in simile with the electric grid or telephone system. By having little to no setup-cost and no requirement to provide some hardware of your own (which may be the case for the less centralized grids) the cloud makes sure to provide a utility accessible not only to large companies or research facilities.

In summary, a cloud is a computing service which is always accessible via the Internet with no large barriers to entry. It uses virtualization to encapsulate users on the same hardware while providing opportunities for massive scaling using virtual machines. If it is a service with a cost, the cost is proportional to how much one uses the cloud.

1http://aws.amazon.com/ec2/

2https://cloud.google.com/products/compute-engine

3http://drive.google.com

(23)

3.4.2 Potential issues with cloud computing

There are, of course, issues that arise when using cloud computing and users need to be aware of these issues to make an informed decision about whether or not to use such a service. For enterprises and researchers, the most significant issues are likely to be information security and cloud interoperability.

The issue with information security is at least two-fold; it touches on data control and security measures of a local network.

Data control must be considered if client data is uploaded to the cloud. The cloud can be assumed to be controlled by a third party and then it is no longer only the client who has access to the data. Depending on how that data is used, the options for security vary. Simply storing the data on a cloud storage service means it is possible to encrypt the data before uploading it.

However, while a client may certainly find a level of encryption which gives the data sufficient protection (such as the one-time pad encryption described by [31]), such levels of encryption may come with legal or practical issues.

If data needs to be operated on in the cloud, for instance when performing computations on large data sets, it is not necessarily possible to work on encrypted data without giving the service provider access to the encryption key, which suddenly opens up several risks. First, the key is now accessible in at least two locations, not just one. Second, the client has in general no direct influence on the security measures in place at the service provider. Third, it is possible that the service provider would give or sell the data on to a third party.

The other part of information security is that the client must be able to communicate with the cloud and the cloud must be able to communicate with the client. It is not uncommon for larger enterprises to have strict network security protocols in place which prohibit such communication. SaaS clouds, as well as PaaS clouds to some extent, may be able to render this issue less likely by using a browser-based interface and HTTPS.

The second large issue is cloud interoperability, or alternatively phrased as vendor lock-in. As you rely on other parties to handle some of your data, it is important to consider what would happen if they at some point change the cost of the services or stopped providing them entirely.

For IaaS clouds this tends to be less of a problem as those clouds are less likely to be very specialized, while SaaS clouds are most likely to be affected. Presuming that the software provided by the SaaS cloud is not open source, it is entirely possible that a discontinuation of the services would leave the users without any feasible method of reading their data.

3.4.3 Amazon Elastic Compute Cloud

Amazon’s Elastic Compute Cloud⁴(EC2) is an IaaS cloud that provides computing power and is part of the suite of cloud services Amazon provides under the banner of Amazon Web Services⁵ (AWS). It is coupled together with Amazon’s Simple Storage Service (S3), another IaaS cloud which provides non-volatile memory for storage solutions. For a detailed introduction in how to set up the use of EC2 with MATLAB, see Appendix C.

There are a few key concepts that are used in conjunction with EC2: Amazon Machine Images (AMI), EC2 instances and EC2 Compute Units (CU).

4http://aws.amazon.com/ec2/

5http://aws.amazon.com

(24)

Instances

An EC2 instance is an instance of a virtual machine which can have multiple virtual cores. The computational power in EC2 is sold in the form of these instances and depending on the user’s requirements there are numerous different instance types to choose from [3]. These instances can be provisioned at an on-demand basis, as is one of the cornerstones of cloud computing (see Section 3.4.1), with a one hour granularity in the pricing; this is referred to as instance-hours and billing is made for every initiated instance hour. Amazon also offers two variants to the on-demand instances: reserved instances and spot instances.

Reserved instances are more long term solutions which are bought on the scale of years instead of hours. With an annual utilization of at least 40%, any type of reserved instance will be more cost efficient than using on-demand instances [4].

Spot instances have a fluctuating price, updated every 5 minutes, where the client may bid for the time of the spot instances. Amazon controls the pricing and when a client’s bid is higher than the current spot price, he is assigned a spot instance at the current spot instance price.

Like the on-demand instances, these are billed at instance hours, with the caveat that if Amazon should interrupt a client’s spot instance, either due to the spot price exceeding the bid price or because there is no capacity left to allocate to the spot instances, then the client is not billed for that partial instance hour.

EC2 Compute Unit

EC2, as a cloud, relies on virtualization and therefor it is possible that there will be multiple hardware configurations participating on the cloud. The client, of course, will never be fully aware of this due to the virtualization, but must somehow know what to expect from the various instances. The EC2 Compute Unit is an attempt at trying to classify how much computing power an instance will give, independent of the hardware. According to Amazon, it is ”the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor” [2].

Amazon Machine Image

When deploying an instance, Amazon must know what operating system to emulate and what software to have installed on the virtual machine; the Amazon Machine Image is the blueprint for exactly those things. Amazon has a number of different standard AMI:s for use, but the clients can also configure their own AMI:s to suit their needs, such as by installing some form of screen sharing software for interactive graphical control of the instance.

4 The Hot Rolling Optimization Problem

The main goal with this report is to further the work done in [1] with respect to the objective functions used for the rougher mill and investigate the possibilities, benefits and disadvantages of using cloud services, specifically Amazon EC2 to perform the optimization. Common for all the objective functions studied is that they are 12-dimensional; this comes from having the pass draft and roll speed as variables in each of five passes in the rougher mill, plus allowing a range of temperatures and thicknesses of the incoming slab, as shown in Table 4.1.

Two objective functions have been studied: minimization of thickness (h-minimization) which is a linear objective function and minimization of the sum of the squared roll forces (SSF-

(25)

Table 4.1: Each dimension in the rougher mill problem corresponds to a physical quantity.

Dimension Physical interpretation 1 Draft, pass 1

2 Draft, pass 2 3 Draft, pass 3 4 Draft, pass 4 5 Draft, pass 5 6 Roll speed, pass 1 7 Roll speed, pass 2 8 Roll speed, pass 3 9 Roll speed, pass 4 10 Roll speed, pass 5

11 Deviation in input temperature 12 Input slab thickness

minimization) which is nonlinear; of these two, the SSF-minimization has been the main focus of the study as it is the more complex one. Various constraints and strictness of the constraints have been used; for a full listing see Appendix A. Following the results in [1], MultiStart and patternsearch are scrutinized more closely and fmincon’s built in parallelism will be tested both on its own and in GlobalSearch. A short argument as to why the other optimization methods were not considered can be found in section 4.8.

A few characteristics about the model of the roughing mill used in this report should be noted.

First, at the core of the computation is a .mex-file, a MATLAB executable, called FinalPassCalc, which simulates the rougher mill given certain input. It requires not only the optimization variables but also a large set of other inputs (approximately 3.5 MB of data), while it returns a struct containing a slew of different parameters, including the forces on the rolls. Any of these parameters may be included in the construction of the objective functions which will decide what is actually being optimized as well as the constraints on the problem. For the thickness minimization, the outgoing thickness from the rougher mill, the transfer bar thickness, is linearly dependent on the input parameters (and the linear constraint on the problem is in fact a maximum deviation in thickness from a target value). Compared to the rest of the work during an optimization with fmincon or patternsearch FinalPassCalc takes the absolute majority of the time, as shown in Table 4.2

Table 4.2: During an optimization using either fmincon or patternsearch, evaluation of the .mex-file FinalPassCalc takes at least 98% of the time.

Optimization function Fraction of run time fmincon 99.66%

patternsearch 98.91%

4.1 Maximum achievable speedup

When trying to improve the performance of any computer program or algorithm, it is important to know where large improvements might be made. Specifically, when trying to improve the speed of an algorithm by parallelizing it, Amdahl’s law gives an upper limit on the expected achievable speedup for a constant problem size.

Cloud Optimization for Hot Rolling

Examensarbete 30 hp September 2014

Cloud Optimization for Hot Rolling

Mikael Sunde

Abstract

Cloud Optimization for Hot Rolling

Sammanfattning

Contents

List of Tables

List of Figures

1 Introduction

2 Notation and Abbreviation

3 Theory

4 The Hot Rolling Optimization Problem