Digging deep A data-driven approach to model reduction in a granular bulldozing scenario.

(1)

A data-driven approach to model reduction in a

granular bulldozing scenario.

(2)

2018-10-08

Author: Samuel Ulin (saul0018@student.umu.se) Supervisors: Martin Servin, Tomas Berglund © Samuel Ulin, 2018

(3)

gine AGX Dynamics is a nonsmooth variant of the popular Discrete Element Method (DEM). While powerful, there is a need for close to real time simulations of a higher spatial resolution than currently possible. In this thesis a data-driven model re-duction approach using machine learning was considered. A data-driven simulation pipeline was presented and partially implemented. The method consists of sampling the velocity and density field of the granular particles and teaching a machine learn-ing algorithm to predict the particles’ interaction with a bulldozer blade as well as predicting the time evolution of its velocity field. A procedure for producing training scenarios and training data for the machine learning algorithm was implemented as well as several machine learning algorithms; a linear regressor, a multilayer percep-tron and a convolutional neural network. The results showed that the method is promising, however further work will need to show whether or not the pipeline is feasible to implement in a simulation.

(4)

1 Introduction 1

1.1 Background . . . 1

1.2 Aim . . . 2

1.2.1 Proposed simulation model . . . 2

1.3 Delimitations . . . 3

1.4 Previous work . . . 4

2 Theory 5 2.1 Discrete element method . . . 5

2.1.1 Smooth DEM . . . 5

2.1.2 Non-smooth DEM . . . 6

2.2 Iterative solvers . . . 8

2.2.1 Projected Gauss-Seidel . . . 9

2.2.2 Conjugate Gradient . . . 10

2.3 Data driven model reduction . . . 10

2.3.1 Multivariate linear regression analysis . . . 10

2.3.2 Neural networks . . . 11

2.3.3 Convolutional neural networks . . . 14

2.3.4 Computational complexity of neural networks . . . 16

3 Method 18 3.1 Choice of approach . . . 18

3.1.1 Direct-iterative splitting . . . 19

3.1.2 Projection at position update . . . 19

3.1.3 Pre-conditioning of iterative solver . . . 19

3.1.4 Conjugate Gradient solver . . . 19

3.1.5 Model-based model reduction . . . 20

3.2 Training and validation data . . . 20

3.2.1 Field sampling . . . 20

3.2.2 Efficient storage . . . 24

3.2.3 Construction of training scenarios . . . 25

3.2.4 Algorithm for simulating training data . . . 26

(5)

3.3.1 Objective of learning algorithm . . . 31

3.3.2 Optimization algorithms . . . 31

3.3.3 Multivariate linear regression . . . 31

3.3.4 Multi-layer perceptron . . . 31

3.3.5 Convolutional Neural Network . . . 32

4 Results 33 4.1 Field sampling . . . 33

4.2 Machine learning algorithms . . . 34

4.2.1 Multivariate linear regressor . . . 34

4.2.2 Multilayer Perceptron . . . 35

4.2.3 Convolutional neural networks . . . 35

5 Discussion 37 5.1 Neural network architecture, optimization and training . . . 37

5.2 Field sampling method . . . 38

5.3 Trajectory algorithm . . . 39

5.4 Proposed simulation model . . . 39

6 Conclusion 41

(6)

1.1 Background

In several industries, granular materials such as sand, ore and pellets are often used. Understanding the dynamics of granular materials and being able to simulate these materials is therefore of huge interest. There are a number of different methods for simulating granular materials, among them the Discrete Element Method (DEM). The main idea behind DEM is to divide the granular materials into representative particles with appropriate bulk properties and to simulate the contact dynamics between these particles. These particles may or may not represent the actual size of the particles of the considered granular material. In general the actual particles are too small to allow simulations using their real size. Collisions between particles often occur during very short time scales. In turn this constrains the time resolution of DEM to match these time scales. One method which allows for larger time steps is the Non-smooth Discrete Element Method (NDEM)1_{. Instead of using an explicit integration method to resolve}

the dynamics as in DEM, NDEM uses an implicit scheme. Using an implicit scheme the contacts no longer need to be resolved in time which allows for much larger time steps. While NDEM allows for orders of magnitude larger time steps than DEM, simulating large volumes of granular materials require large computational resources. AGX Dy-namics is a professional multi-purpose physics engine developed by Algoryx Simulation AB. AGX Dynamics has an implementation of NDEM which allows for simulation of granular bodies as well as their interactions with rigid bodies. It is in the interest of Algoryx Simulation AB to be able to offer granular simulations at close to real-time for earthmoving scenarios.

Earthmoving is the act of cutting or pushing earth by means of e.g., a bulldozer or ex-cavator to prepare land for future construction, mining and more. Applications of fast and accurate earthmoving simulations are e.g., planning of future work, development of autonomous earthmoving equipment, and optimization of equipment parameters such as bulldozer speed or blade cut depth. In order to allow this planning to be done day-to-day the simulations need to be able to finish between the work days. This drives the need to approach real-time simulations.

1_{A very similar method is known as the non smooth Contact Dynamics, however this method treat}

(7)

The computational complexity of the current simulation model using NDEM scales as O(Nc4/3) where Nc is the number of contacts between particles. The proposed data

driven simulation model in this thesis would instead scale as O(Ngrid) where Ngrid is a

number of grid points, partially independent of the number of particles. Furthermore the proposed simulation model is highly parallelizable and can be run on a GPU with several thousand threads.

Currently an earthmoving simulation with 3 · 105 particles, a time step of ∆t = 0.01 s and 100 - 500 solver iterations can be run at between a hundredth to a thousandth of real-time depending on the desired accuracy of the simulation. The testing rig for these simulations were an Intel(R) Core(TM) i7-7820X CPU @ 3.60 GHz using a Parallel Projected Gauss-Seidel solver utilizing 16 threads. This is a very high spatial resolution simulation. A simulation using fewer particles, a typical time step of ∆t = 0.02 s and 50 iterations can currently be run in real time.

1.2 Aim

The aim of this project is to consider a number of different approaches to speed up the granular simulations of AGX Dynamics in earth-moving scenarios. Furthermore, the most promising approach should be implemented to be tested.

It was concluded that a data-driven model reduction had the highest potential for the large speed ups needed. The chosen model consists of sampling the velocity and density field of an accurate NDEM simulation and using that data to train a machine learning algorithm. The machine learning algorithm will then be used to predict the time evo-lution of the granular material’s velocity field as well as its interaction forces with the bulldozer blade. The predicted velocity field may then be used to move the granular material. Finally a position based projection will be used to solve any overlaps among the particles.

1.2.1 Proposed simulation model

(8)

Later, during a live simulation, the same algorithm used for sampling the field data is used to produce input to the trained machine learning algorithm. The predictions of the algorithm are then used to calculate the force on the tool as well as to move the granular particles. The granular particles are moved along the predicted velocity field and any conflicts are resolved using a few iterations of a position projection method. A schematic of the planned pipeline can be seen in figure 1.1.

Plan scenarios

Gather training data

Train machine learning algorithm

Validate machine learning algorithm

Sample field during live simulation

Predict force and torque on blade

Predict time evolution of velocity Move particles and

project

Figure 1.1: The proposed complete training pipeline and simulation loop for earthmoving using a data-driven model reduction approach.

1.3 Delimitations

(9)

v g

Figure 1.2: Illustrations of the considered granular particle beds. To the left is a small mound and to the right is a sharp down step. Note that v denotes the forward direction of the blade and that g denotes the direction of gravity.

Furthermore it was decided that this project will be delimited to work only towards training a machine learning algorithm to predict the interaction between the particles and the bulldozer blade. The aim of this approach is thus to provide a proof of concept for possible future work.

1.4 Previous work

(10)

2.1 Discrete element method

The Discrete element method (DEM), developed by [5], is a method for simulating granular materials particle by particle and contact by contact. However, the simulated particle size are in many cases larger than the corresponding real particle size to reduce computational load. The particle properties are then calibrated in such a way that the bulk behaviour of the real material is achieved.

The simulated particles are allowed 6 degrees of freedom (DOF). The position and ro-tation of a particle will be collected in a common vector denoted xi ∈ R7 where the

first three components correspond to the particle’s translational DOF and the last four components correspond to the particle’s rotational DOF expressed in quaternions. The velocity and angular velocity of a particle will similarly be collected in a common vector denoted vi∈ R6with translational and rotational velocity packaged as before. The force

and torque of the partice is collected in the same way in a vector fi ∈ R6. The mass and

moment of inertia of a particle is collected in the vector Mi = diag(mi13×3, Ii) where

mi is the mass of the particle and Ii ∈ R3×3 the moment of inertia of the particle. In

a contact between two particles, the geometric overlap is denoted by g(x) and the total number of contacts in a system is Nc.

The interactions between particles can be resolved in time using an explicit integration method as in the smooth discrete element method (SDEM). In contrast, the NDEM avoids resolving the interactions in time using by using an implicit integration method.

2.1.1 Smooth DEM

In SDEM the collisions between particles are resolved in time. The contact forces be-tween particles can either be modelled with a linear relation such as Hooke’s law or with a nonlinear model from Hertzian contact theory. The nonlinear Hertz model follows from the theory of linear viscoelastic materials. The normal contact force between two spherical bodies becomes

(11)

with spring stiffness kn = E

√

2d/3(1 − ν2) and damping coefficient c = 4(1 − ν2)(1 − 2ν)η/15Eν2 where E is the Young’s modulus, η is the material viscosity constant, ν is the Poisson’s ratio and d = d−1_i + d−1_j −1 is the effective diameter between particles i and j.

The tangential forces can be modelled using Hooke’s law where the spring elongation is defined as the integral of the relative tangential velocities between the particles,

ft= −

Z

ktvtreldt, (2.2)

where vt_rel is the tangential component of the relative velocities between the contacting particles and kt is the tangential spring constant.

Rolling resistance can be included in the model by once again using Hooke’s law. It takes the same form as 2.2,

τr = −

Z

krwtdt, (2.3)

where wt_rel is the relative rotational velocity between the contacting particles and kr is

the rotational spring constant.

There is no fundamental relation between the properties of the material and the tan-gential and rotational spring constants and it must thus be determined by comparison to real experiments. Both the tangential forces and the rolling resistance torque are limited by the Coulomb law and rolling resistance law respectively. The Coulomb law states that

ft≤ µ|fn|, (2.4)

with friction coefficient µ and the rolling resistance law similarly states that

τr≤ rµr|fn|, (2.5)

where r is the effective contact radius and µr a friction coefficient similar to that of

equation 2.4. For a more detailed derivation of the contact model see e.g., [6].

The above equations can then be integrated using a suitable integration method. If a semi-explicit method is used, the timestep, h, must be smaller than the shortest time scale in the system, pk/m where k corresponds to the largest spring constant of equations 2.1 and 2.2.

2.1.2 Non-smooth DEM

(12)

constraints and complementarity conditions. The parameters of the constraints are modelled in such a way that they map to the nonlinear Hertz model of equation 2.1 and the friction model of equation 2.2. The compliance potential and dissipation function of the constraints are given by

U = 1 n ¯ gTg,¯ (2.6) and R = 1 2( ¯Gnv) T_γ−1_{( ¯}_G nv), (2.7)

The parameters in the compliance potential and the dissipation function is mapped to the parameters of the nonlinear Hertz model by

¯ g = geH_, ¯ Gn= eHgeH+1Gn, −1_n = kn/eH, γ_n−1 = knc/eH,

with eH = 5/4 and Gn is the Jacobian matrix of the geometrical constraint in the

con-tact normal.

The constrained equations of motion when both Coulomb friction and rolling resistance are included becomes [7],

M ˙v = fext(x, v, t) + GTnλn+ GTtλt+ GTrλr

0 ≤ nλn+ gn+ τnGnv ⊥ λn≥ 0

γtλt+ Gtv = 0, |λt| ≤ µ|Gnλn|

γrλr+ Grv = 0, |λr| ≤ rµr|Grλr|,

(2.8)

where GT_nλnare non-penetration constraint forces in the normal direction with Lagrange

multiplier λnwith GTtλtand GTrλrare the corresponding constraint forces for the contact

friction and rolling resistance. The frictional and rolling resistance constraints impose a zero tangential velocity, Gtv, and a zero rotational contact velocity, Grv, as long as

the tangent force and maximum torque are below the bounds given by the Coulomb law friction coefficient µ and the rolling resistance law coefficient µr and effective contact

(13)

AGX Dynamics uses the SPOOK stepper [8] for numerical integration which involves solving a mixed linear complementary problem (MLCP) of the form

Hz = wi+ wu 0 ≤ z − l ⊥ wi≥ 0 0 ≤ u − z ⊥ wu ≥ 0, (2.9) where H =     M −GT n −GTt −GTr Gn Σn 0 0 Gt 0 Σt 0 Gr 0 0 Σr     , z =     vi+1 λn,i+1 λt,i+1 λr,i+1     , b =     −Mv_i− hM−1_f ext 4 hΥngn− ΥnGnvi 0 0     ,

with l and u as temporary slack variables which are used by the solver and

Σn= 4 h n 1 + 4τn h 1Nc×Nc, Σt= γt h12Nc×2Nc, Σr= γr h12Nc×2Nc, Υ = 1 1 + 4τn h 1Nc×Nc,

constructed by the parameters of 2.6 and 2.7 together with the parameter τn which

controls the dissipation rate in the normal contact force.

2.2 Iterative solvers

The computational cost of solving equation 2.8 directly is generally too expensive for large contact networks such as those resulting from NDEM1. Because of this, iterative methods are favored in practice. There are two classes of iterative methods for linear systems. These are the stationary iterative methods and the Krylov subspace methods. AGX Dynamics currently uses a parallel projected Gauss-Seidel (PPGS) solver which is a stationary iterative method.

The computational time tcomp required to simulate treal seconds using a DEM is given

by

tcomp=

treal

∆t · (tcol+ tsolve+ toh), (2.10)

1_{Servin et al.} _{[9] state that the best theoretical scaling for direct solvers has time complexity}

O(N(1+nd)/2

(14)

where ∆t is the time step size, tcol is the collision computation time, tsolveis the iterative

solver computation time and toh is the computational time for overhead computations.

When using NDEM, the major bottle neck of equation 2.10 is the time taken by the iterative solver. The computational time of the iterative solver is modelled by [10]

tsolve = KcNcNit/S||, (2.11)

where Kc is the time required to solve for one iteration, Nit the number of iterations

and S|| the speed-up given by parallelization. The scaling of the number of contacts is

given by Nc ∝ αNp where Np is the number of particles and α the number of contacts

for any given particle. In 3D, α is typically in the range of 6 to 8. 2.2.1 Projected Gauss-Seidel

The Gauss-Seidel (GS) iterative method for solving linear systems is a stationary iter-ative method. In order to solve the MLCP of equation 2.9 using GS, Servin et al. [9] describe a method which is presented here in short. First, the system of equations 2.8 is abbreviated as M −GT G Σ v λ =p q , (2.12)

and split the linear system on Schur complement form,

(GM−1GT+ Σ)λ = q − GM−1p

v = M−1p + M−1GTλ. (2.13)

The Schur matrix is then split S = GM−1GT+ Σ = L + D + LT where L is the lower triangular part of S and D is its diagonal. An update formula for contact α is then constructed, using the block indices α and β,

Dααλ(α)_k+1+ r(α)_k − D(αα)λ (α)

k = 0, (2.14)

using the residual vector

r(α)_k = S(αα)λ (α)

k + G(α)M−1p(α). (2.15)

The normal components of the constraint forces are first solved using this method. If the normal force is non-zero also the tangential forces are calculated. If the tangential forces violates the Coulomb law, it is projected back onto the friction cone according to

(15)

Servin and Wang [10] show that the number of iterations of PGS needed to reach an error scales as Nit = cNpγ/ with c ≈ 0.1, an empirically calculated value, and the

exponent γ = 1/nD where nD is the number of spatial dimensions considered. Thus,

when also considering the number of calculations per iteration is linear in the number of contacts the PGS algorithm in 3D scales as O(Nc4/3).

2.2.2 Conjugate Gradient

The Conjugate Gradient (CG) method is a so-called Krylov subspace method. In the-ory, these classes of methods solve the equations exactly to machine precision in a finite number of steps, however the number of steps needed can be very large. Luckily, suf-ficient accuracy can be reached much earlier than machine precision has been reached. According to [11] the CG method has a time complexity of O(m√κ) where m is the number of non-zero elements of the matrix and κ the condition number of the matrix. Since the number of non-zero elements in equation 2.8 is proportional to the number of contacts the time complexity of the CG method is O(Nc

√

κ) when using NDEM.

2.3 Data driven model reduction

A model reduction consists of changing the representation of the problem such that the degrees of freedom of the problem is reduced leading to a lower computational com-plexity. This change in representation might lead to lower accuracy depending on if the removed degrees of freedom are active in the system or not. The reduced degrees of freedom ultimately leads to a problem which is computationally cheaper.

In this thesis multivariate linear regression analysis and neural networks were used to perform model reduction. What follows is only a short review of the fundamentals of multivariate linear regression analysis and neural networks. For a more thorough treatment of the subject see [12].

2.3.1 Multivariate linear regression analysis

The goal of multivariate regression analysis is to construct a system with input x ∈ Rn

and predict an output y ∈ R. The output vector is a linear function of its input. A prediction given by the linear model is thus given by

ˆ

y = wTx, (2.17)

where the vector w are known as the weights of the model.

The accuracy of such a linear model on a set of inputs and outputs can be measured using the mean squared error,

MSE =X

i

1

N(ˆyi− yi)

(16)

where N is the number of samples in the set. Note that the mean squared error is simply the L2-norm of the distance between the prediction vector ˆy and the output vector y divided by the number of elements,

MSE = 1

Nkˆy − yk

2

2. (2.19)

The error of the model on the set is minimized when the mean squared error is at its minimum. Since equation 2.19 is strictly positive it will have its minimum at the point where its gradient is zero,

∇wMSE = 0. (2.20)

This reduces to solving the normal equation of equation 2.17 whose solution is

w = (XTX)−1Xy, (2.21)

where X is the matrix containing the input vectors to the corresponding output vectors. The theory is easily extendible to a system that maps from an input vector x ∈ Rn to an output vector y ∈ Rm

ˆ

y = Wx, (2.22)

where W is a matrix of weights where each row corresponds to a weight vector as in equation 2.17. Each of these row vectors of weight can be optimized according to equa-tion 2.21.

It is also possible to solve the minimization of equation 2.19 iteratively. A popular choice is the method of gradient descent. The method consists of choosing a starting point w0 and then calculate the gradient in this point. The gradient is always facing

in the direction of steepest ascent. Thus the negative gradient faces the direction of steepest descent. The minimum of the function should thus be in the direction of the negative gradient. An update scheme for the gradient descent method is

wi+1= wi− α∇wMSE(wi), (2.23)

where α is a parameter determining the size of the step along the negative gradient. In machine learning this step size is known as the learning rate.

2.3.2 Neural networks

A neural network in its simplest form is very similar to a sequence of linear regressors. The input to the network x ∈ Rnis mapped onto a number of outputs ˆy ∈ Rm according to

ˆ

y = A(Wx) =X

i<m

(17)

where W ∈ Rm×nis a matrix whose rows are the weights mapping x to the corresponding output in ˆy, A : Rm→ Rm _{is a non-linear function which operates element-wise on the}

vector Wx and ei is the i:th unit vector. The function A is known as the activation

function.

Figure 2.1: An illustration of a single layer neural network [13] in the form of equation 2.24. The leftmost black dots in the image represent the input x ∈ R3_{. The arrows extending from the}

input to the large grey circles represent the weights W ∈ R3×3_{. The grey circles represent the}

activation function A : R3→ R3_{. The arrows extending to the right of the grey circles represents}

the output ˆ_{y ∈ R}3.

The neural network presented in equation 2.24 is a single layer network. An illustration of the single layer network can be seen in figure 2.1. The notion of layer comes from the fact that each matrix-vector multiplication forms an output vector that is multiplied by yet another weight matrix. Each intermediate vector formed is known as a layer and its elements are known as nodes. It is trivial to extend the notion of a single layer neural network to a multilayer neural network

ˆ

y = A2 W2A1(W1x), (2.25)

where W1 ∈ Rp×n and W2 ∈ Rm×p are the weights in the first and second layer of the

network respectively and A1: Rp → Rp and A2 : Rm→ Rm are the activation functions

of the first and second layer respectively.

(18)

Figure 2.2: An illustration of a multilayer perceptron [14], in the form of equation 2.25, which is a type of multilayer neural network. The leftmost black dots in the image represent the input x ∈ R3_{. The arrows extending from the input to the hidden layer represent the weights}

W1∈ R3×3. The grey circles of the hidden layer represent the activation function A1: R3→ R3.

The arrows extending to the output layer represent the weights W2∈ R2×3. The grey circles of

the output layer represent the activation function A2: R2→ R2. Finally, the arrows extending

to the right of the output layer represents the output ˆ_{y ∈ R}2.

Neural networks are powerful approximators due to their ability to distinguish data that is not linearly separable in contrast to multivariate linear regressors. This capability comes from the use of the non-linear activation functions. The classical example of an activation function is the hyperbolic tangent

A(input_{n, i}) = tanh(input_{n, i}), (2.26) where input_{n, i}= (Wnxn)i is the input to node i in layer n or the sigmoid function

A(input_{n, i}) = 1

1 + exp(input_{n, i}) = σ(inputn, i). (2.27) The activation functions of equations 2.26 and 2.27 work well in the sense that they achieve the non-linearity needed for the neural network to be good approximators but they are both computationally expensive. An alternative activation function which achieve the same representative power as the aforementioned functions at a much smaller computational cost is the Rectified Linear Unit (ReLU)

A(input_{n, i}) = (

0, for input_{n, i}< 0.

(19)

The optimization of a neural network is known as training. Similarly to the multivariate linear regressor, the accuracy of a neural network is determined with the use of a loss function. In the case of the linear regressor the loss function was simply the L2-norm of the distance between the prediction of the regressor and the actual output vector y. In the training of neural networks typical loss function are the L2-norm and the cross entropy. However, in this thesis only the first is considered.

Given a neural network with an arbitrary amount of layers the network output can be mapped to its input as

ˆ

y = fn(fn−1...(f2(f1(x)))), (2.29)

where fi represents the multiplication of weights and the activation function of layer i.

The loss of a prediction ˆy given input x and actual output y is given by

L(ˆy, y) = L(fn(fn−1...(f2(f1(x)))), y). (2.30)

The accuracy of the neural network will increase as the loss function is decreased. By choosing the weights Wn such that it minimizes the loss function, the accuracy of the

neural network will be at its maximum. As with multivariate linear regressors, the gradient descent method is suitable for this task. As in the case of linear regressors the gradient of the loss function must be calculated. In neural networks this is done by an algorithm called backpropagation. The backpropagation algorithm calculates the gradient of the loss function with respect to all the weights in the network. The weights are then adjusted according to

Wi+1= Wi− α∇WL(ˆy, y), (2.31)

where once again the parameter α is known as the learning rate.

Since the loss function of the neural network, unlike the multivariate linear regressor, is not convex, the eventual minima found by the gradient descent algorithm is in general not the global minimum of the loss function. It is however in general good enough for the purposes of the network.

2.3.3 Convolutional neural networks

A convolutional neural network (CNN) is a type of neural network which is specialized to input data that resides on a grid. CNNs have had huge success in the domain of image recognition through their ability to find spatial patterns in data. The name is inspired by the mathematical convolution operation

s(t) = (x ∗ w)(t) =

∞

X

a=−∞

(20)

of which the neural network convolution operation share some similarities.

The neural network convolution operation instead takes an input, I ∈ Rn×m×..., which resides on an N-dimensional grid and applies a kernel, K ∈ Rn0×m0×, to this input. The size of the input and kernels must obey n0≤ n, m0 _{≤ m and so on. In 2D the convolution}

operation is S(i, j) = (I ∗ K)(i, j) =X m X n K(m, n)I(i − m, j − n), (2.33) where each extra dimension adds an additional sum and index to the operation. It is typical to also apply an activation function to the output of the convolution operation as they give the same benefits as the ones they give for feedforward networks.

The convolution operation is used in a layer in place of the weights defined in the previ-ous section. Each convolution operation consists of several kernels. Each kernel output is known as a feature map. The kernel elements K(i, j) takes the place of the weights and these are the parameters that are optimized in convolutional layers. Since the sum of equation 2.33 runs over I(i − m, j − n) it is ill-defined at e.g., S(0, 0). If the convolution is not calculated at these points, the output of the convolutional layer would be smaller than its input. The way this is handled is known as padding. There are two types of padding, valid padding and same padding. Valid padding consists of accepting the loss in size of the output of the layer by not calculating the convolutions which would map to points outside the input. Same padding consists of padding the boundary of the input with zeros such that any invalid index yields a zero element.

In equation 2.33 a stride of unity is assumed, that is, the convolution kernel is applied to each pair of indices (i, j) of the input. The stride of a convolutional layer can be increased such that the kernel is applied only at a subset of the indices, separated by a pre-defined interval. If a stride of [2, 2] were used in a convolutional layer, the con-volution would only be calculated at the odd indices of the input. Note that this also changes the size of the output. If a stride of [2, 2] were to be used with same padding the output would be half as large as its input.

(21)

A typical convolutional neural networks consists of a number of convolutional layers followed by an activation function and a pooling layer. The output is then fed into a few fully connected layers, that is the hidden layers as they are defined in the previous section. An illustration of a CNN with convolutional layers, subsampling pool layers and fully connected layers can be seen in figure 2.3.

Figure 2.3: A typical convolutional neural network [15]. A convolutional layers is applied to the RGB image input where 4 kernels are used. The output of this layer is then subsampled using max pooling. Another convolutional layer is applied followed by another subsampling. The output of the last subsampling is fed into a fully connected hidden layer which is mapped to the output.

2.3.4 Computational complexity of neural networks

The computational complexity of any given multilayer perceptron or convolutional neural network consisting of only convolutional layers, pooling layers and fully connected layers is linear in its number of inputs. For a fully connected layers the number of operations are

N_opconn= NinNout, (2.34)

where Nin is the number of inputs and Nout the number of units in the fully connected

layer.

In a convolutional layer each kernel performs

N_opkernel = KwKhKdNch (2.35)

operations on each input grid node where Kw, Kh, Kd are the kernel width, height

and depth respectively and Nchannels are the number of input channels. Thus the total

number of operations performed by a 3D convolutional layer is

N_opconv = NxNyNzNoutNopkernel/(SwSySz), (2.36)

where Nx, Ny, Nz are the number of grid points in each dimension respectively, Sw, Sy,

(22)

the convolutional layer.

The pooling layers are similar to the convolutional layer in that it applies a kernel to each input grid node. However the kernels are not summing over the input channels. Thus, the number of operations performed by a pooling layer is given by

N_oppool = KwKhKdNxNyNzNch/(SxSySz), (2.37)

with notation as above.

The total number of operations that are performed by the network is thus N_opnet=X i N_opconn,i+X j N_opconv,j+X k N_oppool,k (2.38)

with a set of layers and their respective parameters.

(23)

As stated in the introduction the aim of this thesis was to consider a number of ap-proaches and to work towards implementing the most promising approach. Thus, this section is mainly divided into two parts. The first part consists of the reasoning and conclusions regarding the choice of approach and he second part includes the actual work towards the implementation of the chosen approach.

3.1 Choice of approach

A data-driven model reduction approach would consist of training a machine learning algorithm using a large amount of simulated data and teach it to predict certain parts of the dynamics of a later live simulation. The machine learning algorithm could be used to predict the force on the bulldozer blade given a certain granular particle velocity and density field and even to predict the actual time evolution of the particle velocity field. This approach would reduce the degrees of freedom down to the number of sampled points needed for the velocity and density fields and instead of solving linear systems a prediction during a live simulation is simply a series of matrix-matrix and matrix-vector multiplications.

However as stated in the introduction several approaches were considered. The interested reader can keep reading this section for a brief discussion of each considered approach. Feel free to skip ahead to the next section otherwise.

The proposed approaches for an improved particle-based terrain model were:

• A solution for direct-iterative split where the dynamics of a bulldozer or exca-vator is resolved by a direct solver and the granular particles by an iterative solver with an iterative connection between them

• The application of projection at the position update of the granular particles in an attempt to reduce artificial compaction of the particle system.

(24)

• A new approach to model reduction that is tailored for use in bulldozing and excavation with minimal overhead and memory footprint. It can either be model-based or data-driven.

The following subsections will explore the pros and cons of the different approaches and explain the reasoning behind the choice of focusing on data-driven model reduction. 3.1.1 Direct-iterative splitting

Direct-iterative splitting consists of solving parts of the system using a direct solver and other parts with an iterative solver. The connections between the direct and iterative parts are then solved using an iterative method. One method for direct-iterative splitting of systems is already implemented in AGX Dynamics and while improvements might be able to be made they would mainly increase precision and not speed.

3.1.2 Projection at position update

Since the PGS solver is an iterative solver, the solution given is never exact. The error in the solution leads to constraints still being violated after solving. These violations lead to artificial compaction of the granular particle system. Using a projection method for resolving these positional violations could increase accuracy in a cheap way. With this in mind it would be possible to use less iterations while keeping artificial compaction low. However, the performance gains would still not be in the region needed to approach real time.

3.1.3 Pre-conditioning of iterative solver

The convergence rate of the PGS method is proportional to the spectral radius of the Schur complement matrix of equation 2.13. The reasoning for pre-conditioning is that there should be a matrix P that can be pre-multiplied to both sides of the linear system such that the properties of the linear system to be solved become nicer. In the case of PGS this would mean to find a matrix P that increases the spectral radius of the matrix. Even if this would be able to reduce the number of iterations needed for a certain precision by half the resulting speed is nowhere close to real time.

3.1.4 Conjugate Gradient solver

(25)

the condition number of the matrix. While a largely multi-threaded GPU implementa-tion of a CG solver could lead to tremendous speed increases in solving the dynamics of the granular particle systems this task is too large and complicated for the scope of this thesis.

3.1.5 Model-based model reduction

A model-based model reduction of the earthmoving scenario could be to model the granular particles as a continuum and solve the resulting continuum equations on some discretized domain. While this is a feasible approach there would need to be some way of moving between the continuum domain and discrete particle domain. This transfor-mation is not trivial and there is no guarantee that the number of volume discretizations needed are lower than the number of particles residing in that same volume.

3.2 Training and validation data

In order to optimize the machine learning algorithm for the problem at hand, train-ing and validation data is needed. The multilinear regression analysis and the neural networks both need a large amount of training data to calibrate their models. The val-idation data is no different from the training data except that it is withheld from the machine learning algorithm during training. After sufficient training is completed, the validation data is used to validate the generality of the trained model.

Gathering the training and validation data consists of several steps. First, a procedure for gathering the velocities and density of the granular particles need to be developed. Secondly, since the amount of data needed is very large an efficient storage method must be constructed. Thirdly, particle beds corresponding to the considered scenarios need to be constructed. Lastly, a method for finding suitable bulldozer blade trajectories must be designed.

3.2.1 Field sampling

The current sampling method consists of sampling the velocity and density field in a finite volume in front of the bulldozer blade divided into a finite number of subdivisions, henceforth called bins. The sampling volume is defined by its size , L = (Lx, Ly, Lz)

and the number of bins in each dimension, N = (Nx, Ny, Nz). The individual size of

(26)

Figure 3.1: A visualization of the sampling volume and its subdivisions (bins). Each colored box corresponds to one bin. The size of the bins in this figure are exaggerated for clarity. A much smaller bin size is used in practice.

The size is chosen such that it completely includes the bulldozer blade’s width and height. It is lengthened by a short amount downwards toward the ground such that it includes granular particles which are positioned slightly below the blade. This ensures that the data includes whether the sampled particles are in free fall or if they are being pushed along the granular material. This is especially important in the scenario with the down step. A visualization of the bins and the sampling volume is shown in figure 3.1 where a subdivision N = (4, 2, 2) have been used.

The particles are sorted into bins by using the built-in collision system in AGX Dynamics. Each bin is represented by a geometry which registers contacts which are non-interacting. These contacts are used to find which particle resides in each bin. In the case where a particle is partly in different bins, weights are used to represent ”how much” of the particle is in each colliding bin. Since finding the penetration volume between a sphere and a cube is a very computationally complex problem1, axis-aligned boundary boxes (AABBs) are used to calculate the fraction of the particle’s volume residing in the bin. An illustration of the weight calculation process can be seen in figure 3.2. The AABB approach introduces an error for particles residing on the boundaries of each bin. How-ever, the method guarantees that any particle fully in the sampling volume always get a summed weight of unity. Using weights is necessary to eliminate transient effects where a particle is periodically moving on the boundary of bins. The transient effect is most noticeable when a particle is moving in and out of an otherwise empty bin, giving a, potentially large, velocity delta in the velocity field of that bin between subsequent time steps. The complete algorithm for calculating the particle-bin weight is shown in

algo-1

(27)

rithm 1.

Figure 3.2: A 2D illustration of the case where a particle is partially in different bins. The four squares represent the bins. The dashed square around the circle represents the axis-aligned boundary box used to calculate the fraction of the particle’s volume in each bin.

Algorithm 1: Particle-bin weight

1 Input: Bin position p_b, particle position p_p, particle radius r 2 Constants: Bin size l

3 Output: Particle-bin weight w 4 begin

5 // Find absolute distance per axis between centroids of particle and bin 6 p ← p_p− p_b

7 px← |px|, py ← |py|, pz ← |pz|

8 // Find distance from each side of AABB to corresponding bin side 9 d ← (r, r, r) + 1₂· l − p

10 // Clamp result in each axis between 0 (no penetration) and 11 // 2 · r (complete overlap)

12 d ← clamp(d, (0, 0, 0), (2r, 2r, 2r))

13 // Volume overlap is simply the box with sides of the overlaps 14 ∆V ← d_x· d_y· d_z

15 // Weight is the fraction of the AABB’s volume that is overlapping 16 w ← ∆V /(8 · r3)

(28)

When the particles have been sorted into bins, along with their respective weights, the bin velocity and density can be calculated. The velocity field is constructed by calculating the mean velocity of the particles in a specific bin with respect to the weights,

vbin = 1 N X p ∈ bin wp· vp, (3.1)

where vbin is the bin velocity, N is the number of particles in the bin, wp is the

particle-bin weight for particle p and vp is the velocity of particle p.

The density field is found similarly but instead summing the mass of each particle in a bin with respect to its weight. The density is then found by dividing by the bin’s volume, ρbin= 1 Vbin X p ∈ bin wp· mp, (3.2)

where ρbin is the particle density in the bin, Vbin is the bin volume and mp is the mass

of particle p.

In order to study how the velocity and density field of the granular particles affect the force on the bulldozer blade, the contact forces between the granular particles and the blade have to be collected. This is achieved by using the collision system of AGX Dy-namics by extracting and summing the contact forces of any contacts between granular particles and the bulldozer blade. In addition, the velocity of the bulldozer blade was recorded in each time step such that the information can be used to move between a resting frame and the blade’s frame when analysing the velocity field.

(29)

Algorithm 2: Field sampling

1 Input: bins[], contacts[], timestep t, blade velocity v_blade 2 Output: Velocity field, density field, blade contact force f 3 begin

4 f = 0

5 for contact in contacts[] do

6 if contact between particle and blade then

7 f ← f + contact.force

8 end if

9 if contact between particle and bin then

10 p ← contact.particle

11 if contact.depth < 2· p.radius then

12 w ← calculate weight(bin.pos, p.pos, p.radius)

13 store(bin, p, w) 14 else 15 store(bin, p, w = 1) 16 end if 17 end if 18 end for

19 // Print quantities to file 20 write(t, vblade, f ) 21 for bin in bins[] do 22 vbin← _N1_pPp∈binwpvp 23 ρ_bin ← _V1

bin

P

p∈binwpmp 24 write(vbin, ρbin)

25 end for

26 end

3.2.2 Efficient storage

The amount of data needed to be stored for every time step of the simulation is pro-portional to the number of bins used for field sampling. Each bin contributes with four floating point numbers (floats) per time step. In addition, three floats are needed to store the contact forces, three floats are needed to store the blade velocity and one additional float is needed to store the time of the current time step. By using 4 byte floats, the total number of bytes needed by a simulation is

(30)

where Nt is the number of time steps for the simulation.

A typical number of bins used in the simulations performed during this project was N = (48, 48, 20) and Nt = 5 · 103, resulting in S ≈ 3.69 · 109 bytes, or 3.69 gigabytes.

However, storing the same information in formatted form would require at least 2.5 times

2 _{as much storage space. With this in mind the data was stored as a binary file and a}

simple script was written to extract the data in the programming language Python. See appendix A for a detailed specification.

3.2.3 Construction of training scenarios

Two typical earthmoving scenarios were considered as part of the thesis. These scenarios were the flattening of a hill or mound as well as the flattening of a steep down step. In both of the scenarios the bulldozer blade should cut into the granular material, push it to where it should be deposited and then to spread it out by successively lifting the blade. In order to construct a proper granular particle bed, the particles had to be created and their dynamics simulated as the bed is formed. This was done in AGX Dynamics using AGX Granular. In order to get very accurate results during the actual field sam-pling, a small time step and a high iteration count of the iterative solver was needed. The settings used to create the bed was ∆t = 1/100 s and Nit = 500. After the bed

was finalized the simulation was left to run for a few more seconds of simulated time to allow the particles in the bed to come to a rest. The simulation was then serialized to file. AGX Granular allows the user to create particles within a user-defined geometry. The particles are created uniformly within this geometry and thus creating a non-uniform particle bed is a bit tricky. To accomplish this a rectangular, thin strip was created as a particle creation geometry and this geometry was moved at different speeds to create non-uniform particle beds. The resulting particle beds are shown in figure 3.3 and 3.4.

2

(31)

Figure 3.3: The granular particle bed corresponding to the hill scenario. The aim in this scenario is to flatten the hill such that the whole granular particle bed becomes flat.

Figure 3.4: The granular particle bed corresponding to the steep down step scenario. The aim in this scenario is to even out the height difference such that the whole granular particle bed becomes flat.

3.2.4 Algorithm for simulating training data

(32)

an algorithm was devised to construct a trajectory and produce blade velocities along said trajectory.

The first step towards constructing a blade trajectory is to extract the height of the granular particle bed. This is done by dividing the bed into a finite number of subdi-visions. The maximum height of the particle bed is found in each subdivision which produces a discrete height function h(x, y) where x and y are the directions perpendicu-lar to gravity. By restricting the blade to move in a plane spanned by the gravitational direction, z and the x-direction, the height field can be collapsed along the y-direction to produce a new height function

h(x) = 1 Ny

X

y

h(x, y), (3.4)

where Ny is the number of subdivisions in the y-direction.

Since the height field is a discrete function created by sampling the height of discrete particles it produces a very rough curve as seen in figure 3.5.

(33)

In order to be able to create a smooth blade trajectory the height field was smoothed using two different moving average filters. The first filter was used to reduce the high frequency deviations between neighbouring height field points using a narrow filter size. The filter was applied three times with a width of first one, then two and lastly four points. The second filter first calculates the moving average standard deviation of the height field. It then only filters regions with low standard deviation but with a large filter size. This filter was applied two times, first with a width of 8 points and a standard deviation threshold of 0.1. The second time a filter width of 64 points was used but with a stricter 0.01 standard deviation threshold. The second filter thus flattens out regions which are already close to flat but preserves the regions of high slope. Figure 3.6 shows the earlier height field along with its filtered curve.

Figure 3.6: An extracted height field from the steep down step scenario along with its filtered counterpart. The x-direction is the forward direction of the blade and the z-direction is in the direction of gravity.

(34)

d as the cut depth. The blade trajectory then follows the filtered height field at the cut depth until the granular bed’s height is lower than the goal height. At this point the trajectory rises above the height field at a distance equal to the cutting depth such that the granular particles in front of the blade is dispersed below. The blade trajectory following the previous filtered height field with a cut depth of d = 0.12 m and goal height of 0.8 m can be seen in figure 3.7.

Figure 3.7: A filtered height field along with a blade trajectory of cut depth 0.12 m and goal height of 0.8 m. The x-direction is the forward direction of the blade and the z-direction is in the direction of gravity.

The simulated bulldozer blade is controlled by setting its velocity between time steps. Thus velocities must be found that allows the blade to follow the calculated trajectory. The blade should move at a constant speed along the trajectory which means that the distance travelled during a time step is ∆|x| = ∆t · |v|. That is, the blade will move the same distance along the trajectory in each time step. An inverse arc length was implemented as a function of x, such that for each point along the arc length, the corresponding x value could be found. A sequence of length values is created Li = i·∆|x|

and the corresponding x-values are calculated using the inverse arc length function, xi = x(Li). The velocities to be set in each time step is then simply equal to vi =

(35)

trajectory are shown in figure 3.8 and the general outline of the algorithm to calculate the velocity is found in algorithm 3.

Figure 3.8: The resulting velocities along a blade trajectory. The blade moves with each velocity during one time step, at which time it has arrived at the next point on the trajectory.

Algorithm 3: Blade trajectory and velocities

1 Input: particles[], Cut depth d, Goal height h 2 Constants: Blade speed s

3 Output: Blade velocities vel[] 4 begin

5 heightField[][] ← zeros(Nx, Ny) . Initialization

6 for each particle in particles do 7 (i, j) ← getSubdivision(particle)

8 if particle.z > heightField[i, j] then

9 heightField[i, j] ← particle.z

10 end if

11 end for

12 heightField.collapse y() . Collapse along y-axis

13 applyFilters(heightField) . Apply moving average filters

14

15 trajectory ← calculateTrajectory(heightField, d, h)

16 vel[] ← calculateVelocities(trajectory, s)

17 end

3.3 Construction of learning algorithm

(36)

implemented using the TensorFlowTM machine learning library. While TensorFlowTM is mainly a library for neural networks it is straight forward to construct a multivariate linear regressor as a neural network if an iterative method for optimization is used. The resulting form of the training data produced by the simulations are 5-tensors with indices corresponding to [time step, bin x-index, bin y-index, bin z-index, property]. When training the algorithms for predicting the force on the blade the input to the algorithms was the velocity and density fields at a certain time step, i.e. a 4-tensor, and the output was the force on the blade during that time step, a 3-vector.

3.3.1 Objective of learning algorithm

The training and validation data collected are meant to be used for two kind of machine learning algorithms, prediction of the force and torque on the blade from the granular particles and the prediction of the time evolution of the velocity field of the granular particles.

3.3.2 Optimization algorithms

As discussed previously the gradient descent algorithm is a good first choice for optimiz-ing neural networks. There are however alternatives with different benefits and costs. The optimization algorithm used in this thesis was the gradient descent algorithm as well as the Adam algorithm (name dervied from ADAptive Moment estimation) [16]. The Adam algorithm is especially useful when working with networks with noisy gradients. 3.3.3 Multivariate linear regression

The multivariate linear regressor was implemented as a two layer neural network without activation functions. The input layer consisted of the velocity and density field 4-tensor flattened into a vector, x ∈ R4NxNyNz_{, and the output layer was a vector ˆ}_{y ∈ R}3_{. The}

layers were fully connected such that each element of the output was a weighted sum of all the input elements. A variant of the linear regressor was also implemented which only included the x-component of the velocity field along with the density field and mapped only to one output, the x-component of the force.

3.3.4 Multi-layer perceptron

(37)

3.3.5 Convolutional Neural Network

The architecture of the convolutional neural network used was heavily inspired by the one used by [2] that was used to predict the pressure in a fluid given the divergence of its velocity. The input and output was the same as in the MLP. The CNN used in this thesis is a 3D CNN and its structure is as follows:

• Input layer - 4-tensor of size [Nx, Ny, Nz, 2]

• Convolutional layer with 8 kernels of size [2, 2, 2] • Three parallel pooling layers

– No pooling

– Pooling with a filter size of [2, 2, 2] and stride of [2, 2, 2] leading to a down-sampling of factor 2

– Pooling with a filter size of [4, 4, 4] and stride of [4, 4, 4] leading to a down-sampling of factor 4

• Three parallel convolutional layers all with 8 kernels of size [2, 2, 2] • Upsampling of the downsampled layers to original size

• Addition between all parallel layers

• Convolutional layer with 8 kernels of size [1, 1, 1] • Fully connected hidden layer with 384 units • Fully connected hidden layer with 192 units • Output layer

(38)

4.1 Field sampling

In order to verify the field sampling algorithm and also to study the dynamics of the granular data a program was implemented to plot and/or animate 2D slices of the sam-pled velocity and density fields. Figures 4.1 and 4.2 show two such 2D plots where the velocities are calculated relative to the resting frame of the bulldozer blade. The plots show the density field as a heat map where darker colors correspond to greater density and uses an arrow plot to show the 2D velocity. The blade is situated at the left edge of the plot and it ends at about 3-4 field points from the bottom.

The scenario in figure 4.1 show the filling of the bulldozer blade during the hill scenario. The slight upward motion of the granular particles just below the bulldozer blade show that the bulldozer blade is currently moving downwards. Note also the slight vorticity in the velocity field right in front of the blade. Spreading of granular particles is shown in figure 4.2. Note how the particles are falling down in front of the blade.

(39)

Figure 4.2: A 2D slice of a granular velocity and density field. The velocity field is illustrated by the arrow plot and the heat plot represents the density. The forward direction is to the right and gravity points downwards. In this snapshot the blade is spreading the granulars.

During the course of this thesis approximately 120 GiB of field data has been collected for a total of over 1.5 · 105 total time steps. The field data is split between the hill scenario and the steep down step scenario with varying blade speeds and cutting depths. Only a small subset of this data has been used in the following section.

4.2 Machine learning algorithms

Because of the time constraints of the thesis only the training aspect of the machine learning algorithms were implemented. The results presented in this section are focused on the algorithms capacity to learn the field data that were given. Thus, these results only show the algorithms capacity to represent the data that it is trained on, not its capacity to predict new data. However, if a machine learning algorithm isn’t capable of representing the data it is trained on, it most definitely won’t be able to predict similar new data either. As such, the algorithms presented here that has the best capacity for training data representation are also the most likely candidates to be good predictors. The algorithms were not validated due to time constraints. While implementing valdia-tion is not complex, the time was considered better spent to find a learning algorithm which is capable of actually representing the data. In addition, to increase the breadth of the evaluation of algorithms, the algorithms were only trained on 600 to 3000 samples of the training data. This allowed for much quicker iteration in neural network design. 4.2.1 Multivariate linear regressor

(40)

algorithms tunes the weights in such a way that the loss function becomes NaN after a single iteration. The algorithms are then unable to continute the optimization due to the nature of operations with NaN results. Because of this the approach was abandoned at an early stage.

4.2.2 Multilayer Perceptron

The multilayer perceptron showed promise in its capacity to represent its training data.

The algorithm was trained with a wide set of different learning rates α ∈ [10−2, 10−3, 10−6, 10−9]. Figure 4.3 clearly show that a larger learning rate was superior due to its rapid decrease

in the loss function. The MLP was trained using ∼ 600 samples of the data from the hill scenario using a blade speed of 0.9 m/s and a cutting depth of 8 cm.

Figure 4.3: The magnitude of the loss function plotted against number of training steps while training the multilayer perceptron using a learning rate of α = 10−2 to the left and α = 10−6 to the right.

4.2.3 Convolutional neural networks

(41)

Figure 4.4: The loss while training the upscaling CNN using a learning rate of 1e-6.

The same network was trained on ∼ 3000 samples of the data from the hill scenario using a blade speed of 0.9 m/s and a cutting depth of 8 cm. The order of the data was again randomized and the the same data was repeated 5 times and learning rates α = 10−6 and α = 10−7 were used. Figure 4.5 (α = 10−6) show that the loss function quickly decreases in the first ∼ 3000 training steps but that it then saturates. Note that ∼ 3000 training steps is enough to step once through all the samples of the data. The result suggests that the network learns all it can from going through the samples once with repeats not increasing the accuracy. Decreasing the learning rate in this case only slows down the learning process.

(42)

5.1 Neural network architecture, optimization and

train-ing

While the area of neural networks have been developed for several decades, it is not until the last couple of years that the computational power needed for their realization has been available. The area still lacks the fundamental theory of what makes a certain network architecture for one problem than another. Because of this the construction of an accurate and efficient network has become more of an art than a science. There are a lot of parameters to tune such as the number of hidden layers, what kind of kernel size should be used and even more methods such as regularization that have not been explored in this thesis.

(43)

function could be another parameter to tune. However, when CNNs have been used for fluid simulations ReLUs have been performing very well [1], [2]. Even though [3] used sigmoid functions, no noticeable advantage of this choice over ReLUs have been shown. It is interesting to see that both the multilayer perceptron and the convolutional neural network showed similar learning speeds on the chosen dataset. Since the MLP is far shallower than the CNN, it also requires far less computational power to train and also to use for predictions at a later stage. However, the CNN performed far better than the MLP in the case of the dataset used in the training of figure 4.4. Take note that the inference1 time in any of the networks is still small enough to make a large impact in the proposed live simulation pipeline. However, as was shown previously, the number of operations performed by the network is linear in the number of inputs. Making the net-work larger will increase the number of operations needed and the performance should then be measured to make sure it is still satisfactory. Caution should also be taken if there is a need to increase the resolution of the field sampling. If the number of bins is doubled in each dimension the number of operations increase by a factor 8.

The program that read the binary training data from file was implemented in Python. The data is stored time step by time step and thus the program has to unpack all the data one time step at a time and store it in corresponding NumPy2 containers. The process is however very slow, reading one binary file containing 5000 time steps with a storage size of about 4 GiB takes in the region of 10 minutes resulting an effective 6-7 MiB/s read rate. The slow reading rate leads to a slow iteration rate when working with the neural networks. Especially when it comes to the MLP that can perform training on all 5000 time steps in less than a hundreth of the time it takes to read the binary file from disk. This could be remedied by implementing a much more efficient reading algorithm in e.g., C++ and to also use the C++ TensorflowTM _{API instead of the Python API}

used during this thesis.

5.2 Field sampling method

The proposed field sampling method suffers from the obvious drawback that its accuracy is dependent on the size of the bins used. If too large bins are used the resolution of the velocity and density field will be poor, but if a too small bin size is used, only parts of particles will reside in each bin. Since the partitioning of partial particles is computed approximatively, the more particles that are split between bins the larger the error will become. If the algorithm for partitioning particle would be improved to more precisely calculate the intersecting particle-bin volume it would allow an arbitrarily fine resolution. However, a more precise algorithm would lead to increased computational complexity. During the production of training data there is no real constraint on the computation

1

An inference is the computation of an ouput given an input to a machine learning algorithm.

2_{NumPy is a package for scientific computing with Python. The functionality used in this thesis is}

(44)

time. However when sampling the field during a proposed live simulation the increased computation time could prove too costly. In addition the size of the training data files are also linearly proportional to the number of bins. Thus the choice was made to used a reasonable size of bins which doesn’t lead to too many intersecting volume calculations while retaining a decent resolution in each dimension.

While the field sampling method was developed to produce input to a machine learning algorithm it is also a tool to understand the granular dynamics in earthmoving scenarios. The Python script that was developed to plot and animate the velocity and density fields allows the user to analyse 2D slices of the dynamics between the granular particles and the earthmoving tool. Since the data files produced are basically a lower degrees of freedom representation of the granular state it can also be used to analyse compaction and typical flow patterns among other things. This functionality could prove useful in future work.

5.3 Trajectory algorithm

The trajectory algorithm was also a tool that was developed specifically to produce training data for the machine learning algorithms. However, it also offers functionality which could be usable outside of the training algorithm. It offers the user to run a sim-ulation which replicates a typical usage scenario. A possible use case is the prototyping of different bulldozer blades or to perform an analysis of the effects of using different cut depths during earthmoving. The algorithm supports sampling and smoothing height fields as well as creating velocities to follow a given trajectory. A user could easily mod-ify the trajectory created from the height field and thus have the blade following a new trajectory with very short development time.

5.4 Proposed simulation model

While the complete simulation model is still a lot of work away, the results from this thesis shines some light on whether or not it would be possible to actually implement it. The results of the convolutional neural network approach acceptable losses for small samples (4.5 kN out of 50kN is an error of 10%) which is required for the proposed model to be feasible. The inference time of the networks were not measured but the experience was that they were very quick. It should not be a bottle neck for the model even if using a very large convolutional network.

(45)

(46)

While the approach towards speeding up the simulations of earthmoving scenarios pre-sented in this thesis show promise, it is inconclusive whether the complete simulation pipeline would work. Only the blade-particle force prediction machine learning algo-rithm was implemented and it is not clear if it is possible to achieve acceptable accuracy. However, it would not take much time to implement a training algorithm which allows for the machine learning algorithms to use all of the collected training data since this data has already been collected. If it is possible to reach sufficient accuracy and also to predict the time evolution of the velocity field the pipeline would be feasible. At that stage a field integration method and possibly a projection method would need to be implemented.

(47)

[1] X. Guo, W. Li, and F. Iorio. Convolutional neural networks for steady flow ap-proximation. In KDD ’16 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, August 2016.

[2] J. Tompson, K. Schlachter, P. Sprechmann, and K. Perlin. Accelerating eulerian fluid simulation with convolutional networks. In Proc 34th Int Conf Machine Learn-ing, Sydney, 2017.

[3] C. Yang, X. Yang, and X. Xiao. Data-driven projection method in fluid simulation. Computer Animation and Virtual Worlds, 27:415–424, 2016.

[4] L. Ladick´y, S. Jeong, B. Solenthaler, M. Pollefeys, and M. Gross. Data-driven fluid simulations using regression forests. ACM Transactions on Graphics, 34(6), 2015. [5] P. A. Cundall and O. D. L. Strack. A discrete numerical model for granular

assem-blies. G´eotechnique, 29:47–65, 1979.

[6] T. P¨oschel and T. Schwager. Computational Granular Dynamics, Models and Al-gorithms. Springer-Verlag, 2005.

[7] D. Wang. Accelerated granular matter simulation. PhD thesis, Ume˚a University, SE-901 87 Ume˚a, 2015.

[8] C. Lacoursi`ere. Ghosts and machines: regularized variational methods for interactive simulations of multibodies with dry frictional contacts. PhD thesis, Ume˚a University, SE-901 87 Ume˚a, June 2007.

[9] M. Servin, D. Wang, C. Lacoursi`ere, and K. Bodin. Examining the smooth and nonsmooth discrete element approaches to granular matter. International journal for numerical methods in engineering, 97:878–902, January 2014.

[10] M. Servin and D. Wang. Adaptive model reduction for nonsmooth discrete element simulation. Comp. Part. Mech., 3:107–121, 2016.

(48)

[12] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.

[13] Chrislb. Single layer neural network, 2015. URL https://commons.wikimedia. org/wiki/File:Single-Layer_Neural_Network-Vector-Blank.svg. [Online; ac-cessed June 13, 2018].

[14] Offnfopt. A neural network with multiple layers, 2015. URL https://commons. wikimedia.org/wiki/File:Multi-Layer_Neural_Network-Vector.svg. [Online; accessed June 13, 2018].

[15] Aphex34. Typical cnn architecture, 2015. URL https://commons.wikimedia.org/ wiki/File:Typical_cnn.png. [Online; accessed June 13, 2018].

[16] Diedrik P. Kingma and Jimmy Lei Ba. Adam: A method for stochastic opti-mization. In Conference paper at the 3rd International Conference for Learning Representations, San Diego, USA, 2015.

(49)

The field data is packing into a binary file with arbitrary file extension. The data is packaged according to the structure shown in table A.1.1 (the size of all data types used are 4 bytes). In total the data size is 6 + nsteps· (10 + 4 · nbins) bytes.

Table A.1: The storage pattern of the field sampling data files.

Once per file Datatype Amount

Number of bins int 3

Size of bins float 3

For each time step

Current time float 1

Particle-blade contact forces float 3

Blade torque float 3

Blade velocity float 3

For each bin

Bin velocity float 3