Sales Forecasting of Truck Components using Neural Networks

(1)

Master of Science in Computer Science February 2020

Sales Forecasting of Truck Components using Neural Networks

Yeshwanth Reddy Gaddam

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

(2)

This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulﬁlment of the requirements for the degree of Master of Science in Computer Science.

The thesis is equivalent to 20 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identiﬁed as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information:

Author(s):

Yeshwanth Reddy Gaddam E-mail: gaye17@student.bth.se

E-mail: yeshwanth.gaddam@gmail.com

University advisor:

Dr. Hüseyin Kusetogullari

Department of Computer Science

Faculty of Computing Internet : www.bth.se

Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

(3)

Abstract

Background: Sales Forecasting plays a substantial role in identifying the sales trends of products for the future era in any organization. These forecasts are also important for determining the proﬁtable retail operations to meet customer demand, maintain storage levels and to identify probable losses.

Objectives: This study is to investigate appropriate machine learning algorithms for forecasting the sales of truck components and then conduct experiments to forecast sales with the selected machine learning algorithms and to evaluate the performances of the models using performance metrics obtained from the literature review.

Methods: Initially, a literature review is performed to identify machine learning methods suitable for forecasting the sales of truck components and then based on the results obtained, several experiments were conducted to evaluate the performances of the chosen models.

Results: Based on the literature review Multilayer Perceptron (MLP), Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) have been selected for forecasting the sales of truck components and results from the experiments showed that LSTM performed well compared to MLP and RNN for predicting sales.

Conclusions: From this research, It can be stated that LSTM can model com- plex nonlinear functions compared to MLP and RNN for the chosen dataset. Hence, LSTM is chosen as the ideal model for predicting sales of truck components.

Keywords: Sales forecasting, Artiﬁcial Neural Networks, Long Short Term Memory, Multilayer Perceptron, Recurrent Neural Network.

(4)

Acknowledgments

I would like to convey my gratitude to my supervisor Dr. Hüseyin Kusetogullari for his guidance and constructive suggestions. I could not have completed this research study without my supervisor’s constant support and encouragement. I would also like to extend my gratitude to my manager and supervisor at Volvo Group, Alain Boone and Nina Xiangni Chang for supporting me with my Thesis work.

ii

(5)

List of Figures

2.1 Univariate Time Series . . . 5

2.2 Multivariate Time Series . . . 5

2.3 Multilayer Perceptron . . . 7

2.4 Recurrent Neural Network . . . 8

2.5 Long Short Term Memory . . . 9

4.1 Sliding Window . . . 15

4.2 Normalization . . . 16

4.3 Random forest Feature importance . . . 17

4.4 Walk Forward Validation . . . 18

5.1 Kwiatkowski, Phillips, Schmidt and Shin Test . . . 20

5.2 Learning curve . . . 21

5.3 MLP forecasts . . . 22

5.4 RNN forecasts . . . 23

5.5 LSTM forecasts . . . 24

6.1 Mean Absolute Error of machine Learning models using walk forward validation . . . 25

6.2 Root Mean Square Error of machine Learning models using walk forward validation . . . 26

v

(8)

List of Tables

3.1 Short Summary of the literature review results . . . 12

5.1 Multilayer Perceptron algorithm performance . . . 22

5.2 Recurrent Neural Network algorithm performance . . . 23

5.3 Long Short Term Memory algorithm performance . . . 24

6.1 Comparison of performance evaluation results . . . 26

vi

(9)

Chapter 1 Introduction

The advancement of technology has compelled major organizations to undertake a Data-driven decision making approach to make decisions by collecting and analyzing large amounts of information [1]. Retail industries have to consider many factors such as logistics, cost of material, cost of labor, customer demand which inﬂuence the manufacturing process. The above factors are complicated functions which when forecasted accurately helps organizations to strategically plan and increase their market share. However, inaccurate predictions can result in excessive storage or a short- age of goods [2]. This has lead the organizations to excessively invest in constructing data driven models which help organizations to take their decisions based on the information rather than intuition or observations. These data driven models provide reasonable insights towards future trends which in turn help to make proactive measures.

The present day situation showcases clients being ephemeral or short lived by switch- ing between competitors that satisfy their demands more eﬀectively [3]. This puts pressure on the organizations to meet the customer requirements or lose the market share to other competitors that meet the customer demands more eﬀectively. The ability to forecast accurately is one of the important factors in supply chain management for planning and decision making. Sales Forecasting plays an important role in any organization to identify the sales trend of the products for the future period [4].

These forecasts are also important for determining the proﬁtable retail operations to meet customer demand, customer service, maintain storage levels and to identify probable losses. Sales forecasting is a complicated task because of various factors such as diversity of user’s demands, competitors, environmental factors and so on aﬀect the process and has to be taken into consideration for good forecasts [5].

Research suggests that Artiﬁcial Neural Networks (ANN) can capture nonlinear relationships in the underlying dataset [6, 7, 8]. ANNs are also found suitable for building data-driven models to generate predictions using time series data. A lot of research has been going on over the last decade on the usage of Artiﬁcial Neural Network algorithms such as Multi-Layer perceptron and Recurrent Neural Networks, Long Short Term Memory for sales forecasting. [2].

1

(10)

Chapter 1. Introduction 2 In this thesis, Artiﬁcial Neural Network algorithms are used to forecast the sales of Volvo Truck components which consist of the engine brake, front rims, instrument cluster display, Wheelbase, Tank cover, auxiliary radiator, and so on. The forecasts obtained from these models can be used to optimize the storage levels to meet customer demand and also to identify the sales trends of the components. There are about 12,000 components and they have been divided into diﬀerent families such as Core Components, Brake, Engine, Rim and Types, etc. The dataset required for this thesis consists of sales data of core components family which were collected on a weekly basis and is obtained from the Digital Transformation department of Volvo Trucks group.

1.1 Problem Description

In the Volvo truck sales process, quotations for trucks are made based on the customer’s choice, as most of the components of a truck can be customized. Customers can choose from 11,000 different parts to customize a truck that makes up about 85 percent of the truck. Different customers choose different parts depending on mul- tiple factors, such as customer preferences, region, etc. Therefore, because of such diversity, it is difficult to identify which components the customers will select and maintain the stock level accordingly. Hence, Sales forecasting plays an important role in the organization to identify the sales trends of the components and optimize the storage levels. This enables the organization to optimize the truck sales process by addressing issues related to stock levels, delivery time and so on.

1.2 Aim and Objectives

This thesis aims to identify the machine learning algorithms suitable for resolving the sales forecasting problem and ﬁnd an eﬃcient machine learning algorithm among the chosen algorithms based on the performances of the models for the given data set.

Objectives

• To study the various machine learning algorithms for forecasting sales.

• To identify an eﬃcient machine learning algorithm suitable for forecasting the sales based on the results obtained for the given dataset.

(11)

Chapter 1. Introduction 3

1.3 Research Questions

The research questions proposed for this thesis to achieve the aim of this thesis are as follows:

RQ1: How can the machine learning models be selected to resolve the sales forecasting problem?

Motivation: The motivation of this research question is to study the suitable machine learning algorithm(s) for sales forecasting.

RQ2: Which machine learning algorithm is eﬃcient for forecasting the sales of truck components?

Motivation: The motivation of this research question is to identify eﬃcient machine learning algorithms among the chosen algorithms for forecasting the sales of truck components based on the results obtained.

To address RQ1 several machine learning algorithms suitable for forecasting the sales are obtained from the literature review. An eﬃcient machine learning model is then identiﬁed by comparing the performances of the chosen machine learning models to address RQ2. Performance metrics required for the experiment have been selected from the literature review, See Section 4. Following an analysis of the results, the best performing model is to be used for sales forecasting.

(12)

Chapter 2 Background

2.1 Time-series Forecasting

Time series is a time-dependent sequence of observations of a variable [9]. Based on the rate at which the data is collected, time series is categorized into two types- Continuous time series and Discrete-time series. A continuous-time series is a sequence of observations made continuously through time. A time series is a discrete- time series when the observations are collected at ﬁxed or equal intervals of time such as Daily closing price of Google stock, Monthly sales of cars, Yearly rate of change of global temperature and so on. A continuous time series can be sampled at equal intervals of time to form a discrete time series [10]. Time series data can be analyzed for several purposes such as to describe the (seasonal or trend) variations of time series data, to use variations of one time series to gain insights into another time series [10]. Time series forecasting is useful to develop a model by analyzing the past observations of a time series to describe the relationships in the time series.

This model is then used to predict future values of the time series [11]. Time series forecasting can be classiﬁed into two types:

• Stationary Time Series: If the Statistical properties such as mean, variance and auto correlation remain constant over time. Then such a series is called Stationary time series.

• Non-Stationary Time Series: If the Statistical properties such as mean, variance and auto correlation changes over time. Then such a series is called Non-Stationary time series.

Non-Stationary time series is volatile and cannot be accurately modeled because of change in variance, mean and auto correlation. The results obtained from such a time series may be misleading and they might imply a relationship between variables that doesn’t exist. To achieve accurate results, the non-stationary time series has to be transformed into stationary time series [12].

2.1.1 Univariate Time Series

Univariate time series consists of at most one time dependent series where forecasts rely only on the current and past values of the single series being forecasted, likely increased by a function of time, such as a linear trend [13]. The Figure 2.1 below is an example of univariate time series which consists of monthly sales of a product.

4

(13)

Chapter 2. Background 5

Figure 2.1: Univariate Time Series

2.1.2 Multivariate Time Series

A Time Series which contains more than one interdependent variables is known as Multivariate time series. The Figure 2.2 below is an example of multivariate time series which consists of monthly sales of a product along with the factors that inﬂuence the sales of the product.

Figure 2.2: Multivariate Time Series

In the above example, the Figure 2.2 includes Advertising Dollars along with the Sales value for one year. In this scenario, more than one variable needs to be considered to optimally predict Sales. This type of series would come under Multivariate Time Series.

2.2 Model Selection

Weekly sales of Volvo truck components are an example of time series data. Retail sales alter frequently due to various factors such as political factors, economic factors, social and technological factors [14]. This indicates that retail sales data will be a nonlinear function [11]. Research demonstrates that nonlinear models such as Artiﬁcial Neural Networks capture nonlinear relationships better than traditional models [11]. Therefore, Artiﬁcial Neural Networks have been used in this thesis to forecast sales.

(14)

2.3 Forecasting Models

2.3.1 Artiﬁcial Neural Network

Artiﬁcial Neural Networks have a parallel distributed structure and are known for their adaptive learning ability [15]. ARNN consists of interconnected group of nodes called neurons forming an interdependent structure in such a way that every node in the network receives information from the nodes in the preceding layer connected by weights and the nodes in same layer are not connected together [16]. There are various types of Neural Networks such as Extreme learning machine (ELM), Multilayer Perceptron, Recurrent Neural Networks, Long Short Term Memory and so on. ARNN typically consists of one input layer, one output layer and one or more hidden layers in between input and output layers.

Activation Function

Activation Function is a determining parameter that measures and captures trends within the data. The output value of the activation function determines whether a feature is present or not present in the data. Throughout neural networks, the activation function of a node is used to determine the output of that node given a series of inputs. During the training phase, the activation function plays a signiﬁcant role in adjusting the weights of neural networks [17]. The following are the Activation functions used in this thesis:

• Sigmoid Function: Sigmoid Function is nonlinear in nature and it produces output in range (0,1) for a given input value. The derivative of sigmoid function becomes very small and can cause vanishing gradient issue for high input values as there will be no change in predictions [18].

σ(x) = 1

1 + e^−x (2.1)

where ’x’ is the input and ’e’ is Euler’s number.

• TanH: Tanh is a non linear activation function and it produces outputs in the range of (-1,1). Similar to Sigmoid Function, tanh also becomes small for very high values. Thereby, causing vanishing gradient problem [18].

tanh(x) = 2

1 + e^−2x − 1 (2.2)

where ’x’ is the input and ’e’ is Euler’s number.

• Rectified Linear Unit (ReLU) Unlike Sigmoid Function, Rectified Linear Unit is not subjected to vanishing gradient problem. It is non linear and it is computationally efficient compared to sigmoid and Tanh Activation functions [18].

R(x) = max(0, x) (2.3)

(15)

Chapter 2. Background 7 where ’x’ is the input and the function R(x) is equals to x, if x is > 0.

2.3.2 Multilayer Perceptron

Multilayer Perceptron is a type of feed-forward neural network where the information flows from the input layer towards the output layer through the hidden layer as shown in the Figure 2.3. Rectified Linear Unit is used as activation function for Multilayer Perceptron algorithm. MLP makes use of a supervised learning algorithm called backpropagation for training the network [19]. In backpropagation, the error is propagated backward throughout the network. The error is calculated by taking the difference between the network output and the actual output. The network parameters called weights are modified to minimize this error based on this method.

This process is repeated several times until a stopping condition is reached.

Figure 2.3: Multilayer Perceptron [20]

y = f(x1w₁+ x2w₂ + .... + xnw_n) (2.4)

where ‘x’ and ‘y’ represents input and output of the network. ‘w_i’ denotes the connection weights between two input and hidden layer, w_j denotes the connection weights between 1st and 2nd hidden layer, w_k denotes connection weights between hidden layer and output layer, ‘f’ represents the activation function, ’i’,’j’,’k’,’l’ represents the nodes in the network.

2.3.3 Recurrent Neural Network

Recurrent Neural Network (RNN) are suitable for time series data as they have the ability to utilize temporal information from the data using recurrent link between neurons [21]. In Recurrent Neural Networks information from one-time step is passed out to another time step using recurrent connections. As shown in the below Figure 2.4, the current time step x_t is inﬂuenced from the output of hidden neurons at x_T-n time step oﬀering a sort of memory into the network [22].

(16)

Figure 2.4: Recurrent Neural Network [23]

h^(t) = gh(Wxhx^(t)+ Whhh_{T −n}) (2.5)

y^(t) = gy(Whyh^(t)) (2.6)

Where ’x’ represents input of the network, ’y’ represents output of the network, ’g_h’ represents activation function of hidden layer,’g_y’ represents activation function of output layer,’ h’ represents the hidden layer,’ w_hh’ represents the connection weight between the hidden layers, ’w_xh’ represents the connection weight between input layer and hidden layer and ’w_hy’ represents the connection weight between hidden layer and output layer, ’t’ represents the time step in the network.

RNNs makes use of gradient based methods such as Backpropagation through time for learning and the weights in the network are updated based on these methods.

RNNs are diﬃcult to train in some cases when the output at time ’T’ depends on past inputs ’t’ such that t«T [24]. Then the gradients which are used to update the weights in the network start to diminish resulting in vanishing gradient problem.

These things make RNN diﬃcult for modelling long-term dependencies.

2.3.4 Long Short Term Memory

Long Short Term Memory is a type of recurrent neural network. It consists of regu- lators or LSTM units called gates. LSTM can learn long term dependencies because of the usage of gating mechanism and a memory cell. LSTM can overcome the vanishing gradient and exploding gradient problems faced by RNN. A typical LSTM consists of a memory cell, input gate, output gate and a forget gate. The memory cell can remember information over arbitrary time periods and the ﬂow of information to and from the cell is regulated by the gating mechanism [25].

In the below Figure 2.5, c_t is the cell state which is used to carry information throughout the sequence chain which acts as memory. Forget gates represented by f_t are used to determine which information should be eliminated from the cell state. For this purpose they subject the input vector to a sigmoid function and then perform a pointwise multiplication operation with cell state C_t-1. The input gate i_t is used to

(17)

Chapter 2. Background 9 determine the values(or input vector ([h_t-1,x_t]) we are going to update. tanh function is used to generate new values from the input vector. The result from the input gate and tanh are combined by making use of a pointwise multiplication operation which are added to the cell state by making use of a pointwise addition operation.

Finally, the output gate represented by o_t determines which values of the input vector we need output by applying a sigmoid function to the input vector. The value of hidden state h_t is calculated by subjecting the cell state to a tanh function and then multiplying it with the output gate using pointwise multiplication. This information is then passed along the chain in a sequence and the above process is repeated.

Figure 2.5: Long Short Term Memory [25]

i_t= σ(Wxix_t+ Whih_t−1+ bi) (2.7) f_t= σ(Wxfx_t+ Whfh_t−1+ bf) (2.8) c_t= ftc_t−1+ ittanh(Wxcx_t+ Whch_t−1+ bc) (2.9) o_t = σ(Wxox_t+ Whoh_t−1+ bo) (2.10)

h_t= ottanh(ct) (2.11)

where ’x_t’ is the input vector of the LSTM unit, ’o_t’ activation vector of output gate,

’h_t’ is the LSTM unit’s output vector, ’i_t’ is the activation vector of the input gate,

’c_t’ is the cell state vector, ’σ’ is the sigmoid function, and the current and previous time steps are indicated by subscript’ t’ and ’t-1’. ’f_t’ is activation vector of the forget gate,’ W_xi’ indicates weight parameter between input gate and input vector,’ W_xf’ indicates weight parameter between input vector and forget gate, ’W_xc’ indicates weight parameter between cell and input vector, ’W_xo’ indicates weight parameter between input vector and output gate, and bias vector parameter is indicated by b which are learned during training.

(18)

Chapter 3 Related Work

A literature review was performed in this thesis to identify suitable machine learning model for forecasting the sales based on the research performed in time series forecasting in sales forecasting domain using machine learning algorithms:

Due to the fact that time series are crucial in determining forecasts, there is a sig- nificant amount of literature which analyses their behaviours and approaches for finding the desired results. For the study of time series Analysis with linear behav- ior, Holt-Winters, Box-Jenkins [26] and Autoregressive Integrated Moving Average (ARIMA) [11] have been widely used but they fail provide reasonable forecasts when the time series is non linear in nature. Artificial Neural Networks (ANNs) appear as an effective alternative for addressing this problem [8]. In this research, some of the contributions made in the application of Artificial Neural Networks for sales forecasting have been presented:

Vhatkar, Sangeeta, and Jessica Dias [6] have implemented Artificial Neural Network for time series forecasting. They tried to examine the usefulness of artificial Neural Network in time series forecasting and proposed that ARNN gives optimum results to predict the sales of Consumer goods when compared to SVM and Arima. Arti- ficial Neural Networks are also known for their ability to map non linear functions which makes them suitable for sales forecasting [8]. The performance of the model is evaluated using Mean Absolute Difference (MAD), Mean Squared Error (MSE), Root Mean Squared Error (RMSE).

Choi, Tsan-Ming [7] developed a hybrid model by combining Extreme Learning Ma- chine and Grey Model to overcome the issue of capturing non linear events faced by the Grey model. ELM is a type of single layer feed forward neural network. Au- thors have used only 4 periods of data for training the model which is obtained from knitwear Fashion Company to forecast Fast Fashion sales. In this research study, authors have found that the newly developed hybrid model generates good forecasts even with limited amount of data and in a short period of time.

Croda, Rosa María Cantón, Damián Emilio Gibaja Romero, and Santiago-Omar Caballero Morales [8] have implemented Multilayer Perceptron Neural Network to determine the construction of additional warehouses based on the sales forecasted.

The dataset used for this experiment by the authors is limited to 12 months from the year 2015 to 2016 which consists of sales of current distribution centre. From the

10

(19)

Chapter 3. Related Work 11 experiment, the authors have found that even though Multilayer Perceptron doesn’t entirely capture the time series behaviour. It can still produce promising results when time series is small compared to traditional methods. Root Mean Square Error (RMSE) is used for evaluating the models.

From the above articles, it can be noted that neural networks perform well compared to other methods even when with a limited amount of data. However, there is no unique approach to sales forecasting using Artificial Neural Networks because Artifi- cial Neural Networks could have different structures depending on the type of issue.

For example Recurrent Neural Networks performs well as they have the capacity to preserve time dependent information when dealing with time series forecasting problems [21].

Because of volatility, sales forecasting and modeling can be quite complicated. In this research study, authors [27] proposed a Recurrent Neural Network to forecast weekly sales of footwear products. To ﬁnd the optimum network architecture the experiment is carried out using 2 cases. Initially the models are trained with four consecutive weeks sales as input to predict the next weekly sales and then the models are once again trained with six consecutive weeks sales as input. The architecture with minimum average error in the two cases is chosen for forecasting. Experimental results show that this novel approach for forecasting the sales produced promising results. Root Mean Square error (RMSE) and Mean Square error have been used for evaluating the models

In this article, Ajla Elmasdotter, Carl Nystromer [11] have conducted a comparative study between LSTM and ARIMA to forecast the sales of food products in grocery stores. Their primary aim is to reduce food waste by identifying the sales of food products to optimise inventory for grocery stores. The data used for this purpose consists of sales data of Ecuadorian grocery chain from 2013 to 2017. The experiment has been carried out using two scenarios. In one scenario, the two models were used to predict one day ahead in future and in the other scenario, the two models were used to predict 7 days ahead in future using the given data. Experimental results show that LSTM outperformed ARIMA only in one scenario where the predictions are made 7 days ahead in future and had similar performance to ARIMA when the predictions are made only one day in ahead in future. Root Mean Square Error (RMSE) and Mean absolute Error (MAE) have been used for evaluating the performance of the models.

Based on our research on the studies of previous time series forecasts, Multilayer Perceptron, Recurrent Neural Network and Long Short Term Memory have been chosen as suitable Neural Network algorithms for forecasting the sales. Root Mean Square error and Mean absolute Error have been selected as performance metrics to evaluate the chosen the machine learning models. The experiment has been carried out using these three algorithms. A summary of Literature review results have been presented in Table 3.1.

(20)

Chapter 3. Related Work 12

Motivation for the Research Results of the Research Performed a compartive study

using between Long Short Term Memory and ARIMA for forecasting sales in retail [11]

1. LSTM outperformed ARIMA when the predictions are made 7 days ahead and showed similar performance when the predictions are made one day ahead.

2. RMSE and MAE are used to evaluate the performance.

Implemented Artiﬁcial Neural Network for forecasting the sales of Oral care products [6]

1. Artiﬁcial Neural Network produced promising results compared to SVM and ARIMA.

2. RMSE, MAD and MSE are used to evaluate the performance.

Implemented Artiﬁcial Neural Network and Recurrent Neu- ral Network for predicting retail sales [27]

1. Recurrent Neural Networks produced reliable forecasts and can be used for generating mid term forecasts.

2. RMSE and MSE are used to evaluate the performance.

Performed Sales forecasting on retail products using Long Short Term Memory Network [28]

1. Results showed that Long Short Term Memory Network have low forecasting error when the model is used to forecast sales in week level.

2. RMSE and MAE are used to evaluate the performance.

Compared Artiﬁcial Neural Net- work and ARIMA for retail sales [29]

1. Artiﬁcial Neural Network produced promising results compared to ARIMA.

2. RMSE, MAPE and MAE are used to evaluate the performance.

Implemented a hybrid algorithm by binding Extreme learning Machine and Grey Model for Fast Fashion Sales [7]

1. GM-ELM produced promising results and stated that Neural Networks can be used for modelling non linear data.

2. RMSE is used for evaluating the performance of the model.

Implemented Multilayer Percep- tron to forecast the sales of distribution centres [8]

1. Prediction results showed that Multilayer Perceptron produces desirable results with limited data.

2. RMSE is used to evaluate the performance.

Table 3.1: Short Summary of the literature review results

(21)

Chapter 4 Method

The proposed research methodology consists of two methods. Literature Review will be conducted to answer research question RQ1 and experiment will be conducted to answer the research question RQ2. The methods are implemented in a sequential order as the outcome of one method is used for implementing other method.

4.1 Literature Review

Literature review is performed to gain insights about different machine learning algorithms for forecasting the sales in time series domain. The advantages of each model and its theory of effective application for a specific issue are identified. From the literature review, three machine learning algorithms (Multilayer Perceptron, Recurrent Neural Network and Long Short Term Memory) and two performance metrics (Root Mean Square Error and Mean Absolute Error) have been selected.

Inclusion Criteria

• Conference and Journals articles were selected.

• Articles which are available in full text.

• Articles which are in English Language.

• Articles which have been published in between the years 2000-2019.

Exclusion Criteria

• Articles which are not in English Language.

• Articles which have been published before the year 2000.

4.2 Experiment

An experiment is chosen as a research method to answer the second research question because experiment is considered as a suitable research method for dealing with quantitative data as experiments would give greater control over variables. The experiment will be conducted based on the guidelines given by Wohlin [30].

13

(22)

Chapter 4. Method 14 The goal of this experiment is to evaluate the performance of machine learning algorithms Multilayer Perceptron, Recurrent Neural Network and Long Short Term Mem- ory on weekly sales data extracted from order management database of Volvo Trucks.

The eﬃcient algorithm among the chosen algorithms is identiﬁed by analysing and comparing the results obtained from the experiment on the given dataset.

Independent Variables: Multilayer Perceptron, Recurrent Neural Network and Long Short Term Memory neural network.

Dependent Variables: Performance Metrics such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE).

4.2.1 Experimental Setup

• Implemented Multilayer Perceptron, Recurrent Neural Network and Long Short Term Memory on sales data of components using 5 fold walk ford validation approach.

• The algorithm performances have been evaluated using Performance metrics and results are compared to ﬁnd the eﬃcient model for the given dataset.

4.2.2 Dataset

The dataset used for this thesis consists of order management information of variants such as engine brake, front rims, instrument cluster display etc. of a Volvo truck which are collected at weekly basis from the year 2014 till the year 2019. The dataset is obtained from Digital Transformation department of Volvo Trucks group.

The dataset consists of 20 components of the truck along with airﬂow packages which is used as target component. The dataset after preprocessing consists of about 260 weeks and the dataset is divided into training and test sets in a continuous manner using walk forward validation which is explained in the Section 4.2.8.

4.2.3 Data Preprocessing

The features in the dataset represent components of a truck whose sales are recorded on a weekly basis. Dataset consists of about 12000 diﬀerent components of a truck and only core components from the dataset are selected for analyzing the models.

The features with no sales are replaced with zero and features consisting of most number of null values even after re-sampling are eliminated from the dataset. The dataset is then transformed into a supervised learning dataset using sliding window approach which is described in Section 4.2.4.

4.2.4 Sliding Window

Time Series data can be transformed into supervised learning problem by making use of Sliding Window Method. This transformation will enable us to use standard linear and nonlinear machine learning algorithms. A time series dataset can be transformed

(23)

Chapter 4. Method 15

by making use of previous time steps as inputs and new time steps as output variables.

Sliding window method can be used to reconstruct the time series problem in Figure 4.1a into supervised learning problem as shown in Figure 4.1b.

(a) Before implementing (b) After implementing Figure 4.1: Sliding Window

From the Figure 4.1b, the previous time steps X1 and X2 will be used as input variables and the next time steps Y1 and Y2 will be used as output variables in the supervised learning problem. The ﬁrst and last rows will be removed for the training the supervised learning model as past values will be used to predict future values in the sequence. The time series data is checked for stationarity using Kwiatkowski, Phillips, Schmidt and Shin (KPSS) Test before transforming the the data into supervised learning problem.

4.2.5 Kwiatkowski, Phillips, Schmidt and Shin (KPSS) Test

Kwiatkowski, Phillips, Schmidt and Shin (KPSS) Test is a statistical test which is used to test whether a time series is stationary or not. It is used in this study to determine whether a time series is trend stationary or it consists of a unit root. The null hypothesis of the test is that the time series is stationary or trend stationary towards and alternative hypothesis of a unit root series [31].

• Null Hypothesis: If null hypothesis is accepted, the time series is considered to be trend stationary.

• Alternative Hypothesis: If the null hypothesis is rejected, the time series consists of unit root, meaning it is non-stationary.

The results from the tests can be interpreted as follows:

• If P > 0.05 the null hypothesis is not rejected, the time series is trend stationary.

• If P <= 0.05 the null hypothesis is rejected, the time series consists of a unit root i.e., series is non-stationary.

(24)

Chapter 4. Method 16 The results of the Kwiatkowski, Phillips, Schmidt and Shin (KPSS) test have been discussed in the section 5.1.

4.2.6 Normalization

Normalization is a technique used as a part of data preprocessing to standardize the features present in the dataset to a specific scale. Normalization is used to eliminate the influence of one feature on another feature by transforming all the individual features in the dataset to a fixed range to give equal importance to all the features [32].There are several types of normalization techniques such as Min-max normalization, z-score normalization, sigmoid normalization [33]. Min-max normalization is used in this thesis.

ˆ

x_n = x_n− xn(min)

x_n(max) − xn(min) (4.1)

where ’ˆxn’ represents normalised value, ’x_n’ represents the scaled value, ’x_min’ and

’x_max’ represents the minimum and maximum values for the feature variable and n=1,2,3.. represents the feature variables.

(a) Before Normalization (b) After Normalization Figure 4.2: Normalization

The Figure 4.2a represents the subset of dataset before implementing Normalization and the Figure 4.2b represents the dataset with feature variables in a ﬁxed range after implementing the Normalization.

4.2.7 Feature Importance

Feature Importance is used to assess the usefulness of input variables for developing predictive models. It is useful in interpreting how much every feature contributes towards the model accuracy. Feature importance can also be used to improve the model performance by removing features that are insigniﬁcant. This helps to reduce the time taken for training the machine learning models. In this thesis, Random forest feature importance measure has been implemented as it is considered as one

(25)

Chapter 4. Method 17 of the most popular machine learning models for their robustness, relatively good accuracy [34].

Mean Decrease Impurity (MDI)

MDI counts the times a feature is used to split a node, weighted by a number of samples it splits. The random forests tree-based strategies by how well they improve the purity of the node. This MDI over all trees is called as gini impurity and for regression it is called as variance. Thus when training a tree, how much each feature decreases the weighted impurity in a tree can be computed, Furtherly, from each feature the average of the impurity decrease is calculated and the features are ranked accordingly.

Louppe et al [35] proposed the Mean Decrease Impurity (MDI) variable importance measure and derive several theoretical properties. Given a forest of TT randomized trees, each expecting D-dimensional input x=(x1. . . xD) x=(x1. . . xD), the Mean Decrease Impurity variable importance for dimension d is computed as [35]:

MDI(xd) = 1 T

T t=1

s(v)=sn

N_v

N δi(v) (4.2)

Where NN is the total number of examples, Nv is the number of examples reaching inner node v and δi(v) is the impurity decrease achieved at node vv. Here, the second sum in the above equation runs over all inner nodes v in tree t where feature dimension d is selected as split feature [35].

Figure 4.3: Random forest Feature importance

The results obtained by using Random Forest feature importance are shown in the Figure 4.3. The features are ranked in the descending order which are calculated with respect to the target variable (Airﬂow packages). The feature variables at the top of the list are considered more important than feature variables at the bottom of the list.

4.2.8 Walk Forward Validation

Walk Forward Validation is a re-sampling technique used to evaluate the machine learning model. Even though there are several cross-validation techniques, walk for-

(26)

Chapter 4. Method 18 ward validation is implemented in this thesis because it preserves the temporal order in the dataset while splitting the observations as values in time series dataset cannot be randomized while splitting the dataset.

Figure 4.4: Walk Forward Validation [36]

This method contains an outer loop for error estimation and an inner loop for hy- perparameter tuning as shown in Figure 4.4. The data is divided into training and test splits in the outer loop. The training split is further divided into train and validation splits in the inner loop where the model is trained on training split and the parameters for which the model produces least error is extracted from the validation split. The performances of the models are estimated by taking average of errors from each split [36].

Initially, the walk forward validation has been carried out for 3,5,7 folds and it can be noted that the model performance peaked out when 5 folds are used and the performance degraded when the number of folds are increased.

4.2.9 Performance Metrics

Performance Metrics should be selected depending upon the regression problem and the dataset used for the experiment. The metrics used in this thesis for evaluating the models are commonly used metrics for evaluating time series forecasting models.

The metrics used by various authors are described in Section 3 . Following performance metrics have been used to compare the performance of the machine learning models for sales forecasting based on the literature review.

Root Mean Square Error is the square root of diﬀerence between the values predicted by the model and the real or observed values. The value of RMSE indicates the ﬁt of the models on a particular dataset. Values close to zero implies a

(27)

Chapter 4. Method 19 better ﬁt and thereby reducing the impact of outliers [37].

RMSE =

1 n

n i=1

(yi− ˆyi)² (4.3)

Where ’n’ represents the feature variables, ’y_i’ represents actual values and ’ˆyi’ represents the predicted or forecasted value.

Mean Absolute Error is the calculated by taking the average of absolute diﬀerence between values predicted by the model and the real or actual values [38]. Similar to RMSE, the accuracy of model is higher when MAE values are close to zero.

MAE = 1 n

n i=1

|yi− ˆyi| (4.4)

Where ’n’ represents the feature variables, ’y_i’ represents actual values and ’ˆyi’ represents the predicted or forecasted value.

Why MAE is preferred over RMSE?

RMSE is calculated by taking the sum of individual square errors where every individual error inﬂuences the total in proportion to its square rather than its magnitude.

Therefore, RMSE increases as the variance associated with the frequency distribution of error magnitudes increases [39]. This implies RMSE gives higher importance to large errors than the small ones. This makes RMSE an inadequate indicator of average error. Unlike RMSE, MAE gives equal importance to all the individual errors which makes it an appropriate measure for calculating average model performance.

(28)

Chapter 5 Results

5.1 Stationarity Test

Kwiatkowski, Phillips, Schmidt and Shin Test was used to test the stationarity of the dataset and the results obtained are shown in Figure 5.1.

Figure 5.1: Kwiatkowski, Phillips, Schmidt and Shin Test

It can be seen from Figure 5.1, the p-value is greater than 0.05 and hence the null hypothesis cannot be rejected. This indicates the data is stationary and can further be utilized for performing time series analysis.

LSTM, MLP and RNN have been used for forecasting the sales of Volvo truck components and the following results depict how LSTM, MLP and RNN have performed based on various preformance metrices used in this thesis.

5.2 Learning curve

Learning curves are used to evaluate the performance of a model to diagnose whether the model underfits, overfits or it is a good fit on the chosen dataset. They can also be used to determine if the statistical properties of training dataset is relative to the properties of validation dataset. Learning curves are used to identify change in learning performance by taking a plot of loss or error over time. It can also be used to identify how well a model generalizes to unseen data by making use of validation dataset. Based on the structure and dynamics of the learning curve, the configura- tion of the model can be changed to enhance the learning and performance of the model [40].

20

(29)

Chapter 5. Results 21

(a) MLP learning curve (b) RNN learning curve

(c) LSTM learning curve Figure 5.2: Learning curve

The Figures 5.2a, 5.2b and 5.2c represents the Learning curves of Multilayer Perceptron, Recurrent Nerural Network and Long Short Term Memory and it is clear that the training and validation errors decreases to an optimal point with small diﬀerence in loss values between them indicating that the machine learning models used in thesis are a good ﬁt.

(30)

5.3 Forecasts

5.3.1 Multilayer Perceptron

Multilayer Perceptron is trained using 5 fold walk forward validation approach and the data is split in 80:20 ratio where 80% of the data was used as the training set and 20% of the data was used as the test set. The training set is further split into training and validation sets which is described in Section 4.2.8. The performance of the model is estimated using the performance metrics RMSE and MAE and the results obtained are shown below.

The Table 5.1 represents Root Mean Square Error and Mean Absolute Error values generated by Multilayer Perceptron algorithm using 5 fold walk forward validation approach. The Root Mean Square Error of Multilayer Perceptron is 0.225 and the Mean Absolute Error of MLP is 0.129.

Performance Metrics Multilayer Perceptron

Mean Absolute Error 0.129

Root Mean Square Error 0.225

Table 5.1: Multilayer Perceptron algorithm performance

The Figure 5.3 represents the actual sales and the forecasted sales of the target variant (airﬂow packages) using Multilayer Perceptron algorithm where the blue line indicates the predicted value of the target variant and the orange line represents the actual values.

Figure 5.3: MLP forecasts

(31)

5.3.2 Recurrent Neural Network

Recurrent Neural Network is trained using 5 fold walk forward validation approach and the data is split in 80:20 ratio where 80% of the data was used as the training set and 20% of the data was used as the test set. The training set is further split into training and validation sets which is described in Section 4.2.8. The performance of the model is estimated using the performance metrics RMSE and MAE and the results obtained are shown below.

The Table 5.2 represents Root Mean Square Error and Mean Absolute Error values generated by Recurrent Neural Network algorithm using 5 fold walk forward validation approach. The Root Mean Square Error of Recurrent Neural Network is 0.150 and the Mean Absolute Error of RNN is 0.05.

Performance Metrics Recurrent Neural Network

Table 5.2: Recurrent Neural Network algorithm performance

The Figure 5.4 represents the actual sales and the forecasted sales of the target variant (airﬂow packages) using Recurrent Neural Network Algorithm where the blue line indicates the predicted value of the target variant and the orange line represents the actual values.

Figure 5.4: RNN forecasts

(32)

5.3.3 Long Short Term Memory

Long Short Term Memory is trained using 5 fold walk forward validation approach and the data is split in 80:20 ratio where 80% of the data was used as the training set and 20% of the data was used as the test set. The training set is further split into training and validation sets which is described in Section 4.2.8. The performance of the model is estimated using the performance metrics RMSE and MAE and the results obtained are shown below.

The Table 5.3 represents the Root Mean Square Error and Mean Absolute Error values generated by Long Short Term Memory algorithm using 5 fold walk forward validation approach. The Root Mean Square Error of Long Short Term Memory is 0.1055 and the Mean Absolute Error of LSTM is 0.0397.

Performance Metrics Long Short Term Memory

Table 5.3: Long Short Term Memory algorithm performance

The Figure 5.5 represents the actual sales and the forecasted sales of the target variant (airﬂow packages) using Long Short Term Memory Algorithm where the blue line indicates the predicted value of the target variant and the orange line represents the actual values.

Figure 5.5: LSTM forecasts

(33)

Chapter 6 Analysis and Discussion

6.1 Analysis of Experiment results

6.1.1 Performance analysis using Mean Absolute Error

Figure 6.1: Mean Absolute Error of machine Learning models using walk forward validation

The above Figure 6.1 represents Mean Absolute error of Recurrent Neural Net- work, Long Short Term Memory and Multulayer Perceptron algorithms generated across diﬀerent splits using walk forward validation. Long Short Short Term memory performed better in all the splits when compared to Multilayer Perceptron and Recurrent Neural Network. Multlayer Perceptron has the highest error across all the splits.

6.1.2 Performance analysis using Root Mean Square Error

The above Figure 6.2 represents Root Mean Square error of Recurrent Neural Net- work, Long Short Term Memory and Multulayer Perceptron algorithms generated across diﬀerent splits using walk forward validation. Long Short Short Term memory performed better in all the splits when compared to Multilayer Perceptron and Recurrent Neural Network. Multlayer Perceptron has the highest error across all the splits.

25

(34)

Chapter 6. Analysis and Discussion 26

Figure 6.2: Root Mean Square Error of machine Learning models using walk forward validation

6.1.3 Key Findings

In this thesis, MLP, RNN and LSTM based machine learning algorithms accuracy to forecast sales of truck components have been compared. The average error between the models indicate that LSTM based forecasting algorithm performed better than the other two algorithms with Root Mean Square Error of about 0.10559 and Mean Absolute Error of about 0.03974. From the results, it can be seen that the performance of LSTM is better because of its ability to capture long term dependencies.

LSTM performance might have been improved if additional features are taken into consideration which are dicussed in 7.1. The performance of the models by various performance metrics are shown below.

Algorithms Mean Absolute Error Root Mean Square Error

Multilayer Perceptron 0.12944 0.22514

Recurrent Neural Network 0.0570 0.1507

Long Short Term Memory 0.03974 0.10559

Table 6.1: Comparison of performance evaluation results

6.1.4 Discussion

RQ1: How can the machine learning methods be selected to resolve the sales forecasting problem?

Answer: Based on the literature review, three machine learning models namely Multilayer Perceptron, Recurrent Neural Network and Long Short Term Memory

(35)

have been selected for forecasting the sales of Volvo truck components.

RQ2: Which machine learning algorithm is eﬃcient for forecasting the sales of truck components?

Answer: Long Short Term Memory proved to be eﬃcient machine learning algorithm. From the experiment, Long Short Term memory performed better when compared to Multilayer Perceptron and Recurrent Neural Network. This is because LSTM makes use of temporal information from the data. Average Root Mean Square Error of Long Short Term Memory using 5 fold walk forward validation is 0.10559 and Average Mean Absolute Error is 0.03974 which is less when compared to that of Multilayer Perceptron and Recurrent Neural Network. The performances of the models are discussed in Section 6.1.1 and 6.1.2.

6.2 Limitations

• This study is performed on core component family of the truck and the results might diﬀer when diﬀerent set of components of the truck are used. This is because of the variations in sales data of those particular components. So, it cannot be guaranteed that these models will derive similar results on other components.

• Some of the factors influencing sales like incentives, discounts offered on specific variants, quality, performance are not available. The performance of the models used in this thesis might have improved if they are included in the dataset.

6.3 Validity threats

6.3.1 Internal validity

Internal validity refers to the extent to which the research is carried out [41]. The issues that arise related to missing data or duplicate values being generated while re-sampling is mitigated by maintaining a cloud backup which contains the records of experiment being conducted.

6.3.2 Extenal validity

External validity refers to the extent to which the ﬁndings from a research can be generalized to diﬀerent groups or populations [42]. This validity is achieved by using historical sales data of components for training and testing the Neural Network algorithms. The threat of variable variance is mitigated by specifying all the dependent variables of this analysis in a certain way that they are relevant in any general experimental environment.

(36)

6.3.3 Conclusion validity

The conclusion validity holds if the ﬁndings of a research are justiﬁed properly [43].

These risks arise when the measures used for evaluation are not selected properly and can contribute to incorrect interpretation of the relationship between independent and dependent variables in the study. These threats can be mitigated by maintaining appropriate conﬁguration of the experimental setup and methods and by choosing various performance metrics to analyze and evaluate predictions generated the Neural Network models.

(37)

Chapter 7 Conclusions and Future Work

In this research study three Artiﬁcial Neural Network algorithms: Multilayer perceptron, Recurrent Neural Network, and Long Short Term Memory are identiﬁed as suitable machine learning algorithms for predicting sales of Volvo truck components.

Among several machine learning models, Artificial Neural Networks have been selected because they can effectively handle non-linear data. The three algorithms are evaluated using RMSE and MAE performance metrics and the model with the lowest value is considered to an efficient model for generating forecasts. Based on the results from the experiment Long Short Term Memory performed better than MLP and RNN for generating predictions with a Mean Absolute Error of 0.03974 and Root Mean Square Error of 0.1055. The forecasts are then used to adjust stock levels according to the predictions.

7.1 Future Work

The forecasts can be further improved by including some set of influential factors as feature variables in the dataset. Factors such as incentives and discounts offered on specific variants. Technological factors such as quality, performance, and functional- ity. Policies that are taken up the organization to increase the profit margin can also be used as features in the dataset. These are some of the factors which can be used to understand the variations in the time series data allowing us to further improve the model performance to generate reliable forecasts.

Potential future work involves comparing the results of the Artiﬁcial Neural Networks used in this thesis with the results obtained by developing hybrid models of Neural Networks and other traditional models. This could help researchers to achieve better results.

29

(38)

References

[1] M. Bohanec, M. K. Borštnar, and M. Robnik-Šikonja, “Explaining machine learning models in sales predictions,” Expert Systems with Applications, vol. 71, pp. 416–428, 2017.

[2] M. Orra and G. Serpen, “An exploratory study for neural network forecasting of retail sales trends using industry and national economic indicators,” in Pro- ceedings for the 4th Workshop on Computational Intelligence in Economics and Finance, pp. 875–878, 2005.

[3] N. Gordini and V. Veglio, “Customers churn prediction and marketing reten- tion strategies. an application of support vector machines based on the auc parameter-selection technique in b2b e-commerce industry,” Industrial Market- ing Management, vol. 62, pp. 100–107, 2017.

[4] A. Sachdev and V. Sharma, “Stock forecasting model based on combined fuzzy time series and genetic algorithm,” in 2015 International Conference on Com- putational Intelligence and Communication Networks (CICN), pp. 1303–1307, IEEE, 2015.

[5] F. Chen and T. Ou, “Sales forecasting system based on gray extreme learning machine with taguchi method in retail industry,” Expert Systems with Applica- tions, vol. 38, no. 3, pp. 1336–1345, 2011.

[6] S. Vhatkar and J. Dias, “Oral-care goods sales forecasting using artiﬁcial neural network model,” Procedia Computer Science, vol. 79, pp. 238–243, 2016.

[7] T.-M. Choi, C.-L. Hui, N. Liu, S.-F. Ng, and Y. Yu, “Fast fashion sales forecast- ing with limited data and time,” Decision Support Systems, vol. 59, pp. 84–92, 2014.

[8] R. M. C. Croda, D. E. G. Romero, and S.-O. C. Morales, “Sales prediction through neural networks for a small dataset,” IJIMAI, vol. 5, no. 4, pp. 35–41, 2019.

[9] D. C. Montgomery, C. L. Jennings, and M. Kulahci, Introduction to time series analysis and forecasting. John Wiley & Sons, 2015.

[10] C. Chatﬁeld, The analysis of time series: an introduction. Chapman and Hal- l/CRC, 2003.

[11] A. Elmasdotter and C. Nyströmer, “A comparative study between lstm and arima for sales forecasting in retail,” 2018.

30

(39)

References 31 [12] T. Iordanova, “An introduction to stationary and non-stationary processes.”

https://www.investopedia.com/articles/trading/07/stationary.asp, Nov 2019.

[13] C. Chatﬁeld, Time-series forecasting. Chapman and Hall/CRC, 2000.

[14] D. Fantazzini and Z. Toktamysova, “Forecasting german car sales using google data and multivariate models,” International Journal of Production Economics, vol. 170, pp. 97–135, 2015.

[15] H. Li, L. Pan, M. Chen, X. Chen, and Y. Zhang, “Rbm-based back propagation neural network with bsasa optimization for time series forecasting,” in 2017 9th International Conference on Intelligent Human-Machine Systems and Cybernet- ics (IHMSC), vol. 2, pp. 218–221, IEEE, 2017.

[16] H. Weytjens, E. Lohmann, and M. Kleinsteuber, “Cash ﬂow prediction: Mlp and lstm compared to arima and prophet,” Electronic Commerce Research, pp. 1–21, 2019.

[17] T. Hossen, S. J. Plathottam, R. K. Angamuthu, P. Ranganathan, and H. Sale- hfar, “Short-term load forecasting using deep neural networks (dnn),” in 2017 North American Power Symposium (NAPS), pp. 1–6, IEEE, 2017.

[18] Geva, “7 types of activation functions in neural networks: How to choose?.” https://missinglink.ai/guides/neural-network-concepts/

7-types-neural-network-activation-functions-right/.

[19] M. Sodanil and P. Chatthong, “Artiﬁcial neural network-based time series anal- ysis forecasting for the amount of solid waste in bangkok,” in Ninth Interna- tional Conference on Digital Information Management (ICDIM 2014), pp. 16–

20, IEEE, 2014.

[20] S. Vieira, W. Pinaya, and A. Mechelli, “Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: Methods and applications,” Neuroscience Biobehavioral Reviews, vol. 74, 01 2017.

[21] U. Ugurlu, I. Oksuz, and O. Tas, “Electricity price forecasting using recurrent neural networks,” Energies, vol. 11, no. 5, p. 1255, 2018.

[22] J. Dai, P. Zhang, J. Mazumdar, R. G. Harley, and G. Venayagamoorthy, “A comparison of mlp, rnn and esn in determining harmonic contributions from nonlinear loads,” in 2008 34th Annual Conference of IEEE Industrial Electron- ics, pp. 3025–3032, IEEE, 2008.

[23] Z. Cui, R. Ke, Y. Wang, et al., “Deep stacked bidirectional and unidirectional lstm recurrent neural network for network-wide traﬃc speed prediction,” in 6th International Workshop on Urban Computing (UrbComp 2017), 2016.

[24] Y. Bengio, P. Simard, P. Frasconi, et al., “Learning long-term dependencies with gradient descent is diﬃcult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157–166, 1994.

(40)

References 32 [25] Z. Wang and Y. Lou, “Hydrological time series forecast model based on wavelet de-noising and arima-lstm,” in 2019 IEEE 3rd Information Technology, Network- ing, Electronic and Automation Control Conference (ITNEC), pp. 1697–1701, IEEE, 2019.

[26] Z. Tang, C. De Almeida, and P. A. Fishwick, “Time series forecasting using neu- ral networks vs. box-jenkins methodology,” Simulation, vol. 57, no. 5, pp. 303–

310, 1991.

[27] P. Das and S. Chaudhury, “Prediction of retail sales of footwear using feed- forward and recurrent neural networks,” Neural Computing and Applications, vol. 16, no. 4-5, pp. 491–502, 2007.

[28] Q. Yu, K. Wang, J. O. Strandhagen, and Y. Wang, “Application of long short- term memory neural network to sales forecasting in retail—a case study,” in International Workshop of Advanced Manufacturing and Automation, pp. 11–

17, Springer, 2017.

[29] C.-W. Chu and G. P. Zhang, “A comparative study of linear and nonlinear models for aggregate retail sales forecasting,” International Journal of produc- tion economics, vol. 86, no. 3, pp. 217–231, 2003.

[30] C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén, Experimentation in software engineering. Springer Science & Business Media, 2012.

[31] D. Kwiatkowski, P. C. Phillips, P. Schmidt, and Y. Shin, “Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root?,” Journal of econometrics, vol. 54, no. 1-3, pp. 159–178, 1992.

[32] “Data normalization and standardization.” https://docs.google.

com/document/d/1x0A1nUz1WWtMCZb5oVzF0SVMY7a_58KQulqQVT8LaVA/

editheading=h.bt7jdccuynnx.

[33] J. Shahrabi, E. Hadavandi, and S. Asadi, “Developing a hybrid intelligent model for forecasting problems: Case study of tourism demand time series,”

Knowledge-Based Systems, vol. 43, pp. 112–122, 2013.

[34] E. Lewinson, “Explaining feature importance by example of a random forest,”

Aug 2019.

[35] G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts, “Understanding variable importances in forests of randomized trees,” in Advances in neural information processing systems, pp. 431–439, 2013.

[36] C. Cochrane, “Time series nested cross-validation,” May 2018.

[37] Stephanie, “Rmse: Root mean square error.” https://www.statisticshowto.

datasciencecentral.com/rmse/, Oct 2019.

(41)

References 33 [38] L. Frías-Paredes, F. Mallor, M. Gastón-Romeo, and T. León, “Dynamic mean absolute error as new measure for assessing forecasting errors,” Energy conver- sion and management, vol. 162, pp. 176–188, 2018.

[39] C. J. Willmott and K. Matsuura, “Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance,”

Climate research, vol. 30, no. 1, pp. 79–82, 2005.

[40] “Learning curves to diagnose machine learning model

performance.” https://machinelearningmastery.com/

learning-curves-for-diagnosing-machine-learning-model-performance/, Aug 2019.

[41] B. A. Kitchenham, D. Budgen, and P. Brereton, Evidence-based software engi- neering and systematic reviews, vol. 4. CRC press, 2015.

[42] R. McDermott, “Internal and external validity,” Cambridge handbook of experi- mental political science, pp. 27–40, 2011.

[43] M. A. García-Pérez, “Statistical conclusion validity: Some common threats and simple remedies,” Frontiers in psychology, vol. 3, p. 325, 2012.

(42)

Appendix A

Supplemental Information

34

(43)

(44)

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

Sales Forecasting of Truck Components using Neural Networks