Forecasting of a Loan Book Using Monte Carlo Methods

(1)

U.U.D.M. Project Report 2016:22

Examensarbete i matematik, 30 hp Handledare: Maciej Klimek

Examinator: Erik Ekström Juni 2016

Department of Mathematics Uppsala University

Forecasting of a Loan Book Using Monte Carlo Methods

Peter Cassar

(2)

(3)

i

Declaration of Authorship

I, Peter CASSAR, declare that this thesis titled, “Forecasting of a Loan Book Using Monte Carlo Methods” and the work presented in it are my own. I confirm that:

• This work was done wholly or mainly while in candidature for a re- search degree at this University.

• Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated.

• Where I have consulted the published work of others, this is always clearly attributed.

• Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work.

• I have acknowledged all main sources of help.

• Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself.

Signed:

Date:

(4)

ii

“Min ibidill il miken ibidil i vintura haliex liradi ’al col xibir sura

hemme ard bayad v hemme ard seude et hamyra Hactar min hedann heme tred mine tamara.”

Pietru Caxaro

(5)

iii

UPPSALA UNIVERSITY

Abstract

Department of Mathematics

Master in Financial Mathematics

Forecasting of a Loan Book Using Monte Carlo Methods by Peter CASSAR

This thesis tackles a problem proposed by a Payments Solution Provider in Sweden, were the goal is to provide a forecast for the liquidity needs over different time frames. An algorithm was tailored for this problem which is based on time series modeling of the number of daily executed purchases combined with bootstrap sampling to simulate both the monetary value as well as the repayment of said purchases. Monte Carlo techniques were used to provide estimates for the daily executed and received transactions.

(6)

iv

Acknowledgements

I would like to thank my supervisor Professor Maciej Klimek for the help and support given throughout this thesis project.. . .

(7)

v

Contents

Declaration of Authorship i

Abstract iii

Acknowledgements iv

1 Introduction to the Problem 1

2 Generation of Simulated Data 5

2.1 Choice of Trend and Seasonality for Executed Transactions . 5

2.2 Repayment Options. . . . 6

3 The Algorithm 9 3.1 Implementation . . . . 11

3.2 Estimating Intensities λx,t . . . . 12

3.3 Simulation of Executed Purchases . . . . 12

3.4 Simulation of Received Payments. . . . 13

4 The Forecast 14 4.1 Results . . . . 14

4.2 The Loan Book. . . . 16

5 Conclusion 20 5.1 Summary. . . . 20

5.2 Future Work . . . . 21

A Bootstrap Historical Simulation 22 B Time Series Estimates for λxt 25 B.1 Elimination of Trend and Seasonal Component . . . . 25

B.2 Testing the Residuals . . . . 27

C MATLAB m-files 28 C.1 forecasting.m . . . . 28

C.2 boostrapHistoricalExecuted2.m . . . . 34

C.3 truncnormrnd.m . . . . 37

C.4 conditionrandsample.m . . . . 37

C.5 deSeason.m . . . . 37

C.6 deTrendSeason.m . . . . 38

C.7 instalment.m . . . . 39

C.8 simulateOpenTS.m . . . . 39

Bibliography 42

(8)

vi

List of Figures

1.1 Reasons to discontinue a purchase . . . . 2

2.1 Executed and Received Daily Transactions . . . . 8

3.1 The Forecasting Algorithm. . . . 10

3.2 Poisson Residuals vs Normal Residuals . . . . 13

4.1 Forecast for the next year. . . . 15

4.2 Forecast for the next 100 days . . . . 16

4.3 Loan Book Forecast - One year . . . . 17

4.4 Loan Book Forecast - 100 days . . . . 18

4.5 Loan Book Forecast -with strategic financing . . . . 19

A.1 Forecast for Repayment Times. . . . 23

A.2 Forecast for Repayment Times using conditional Bootstrap . 24 B.1 Daily Transactions for the Invoice Repayment Option . . . . 25

B.2 Figures showing the fit using MA and LS polyfit for the 1st trend estimate respectively. . . . 26

B.3 Figures showing histograms for the residuals obtained using MA and LS polyfit for the 1st trend estimate respectively . . 27

(9)

1

Chapter 1

Introduction to the Problem

For an economy to function properly it is important that there is a fair, safe and efficient way in which individuals can purchase items or services. For a very long time this need has been satisfied by cash and later other paper- based payments such as cheques or paper-based credit transfers, but new payment methods are being created all the time with advancements in tech- nology. Consumers are moving away from using cash and other paper- based payment methods, to faster and more comfortable methods such as mobile payment, card payments and internet banking [8]. Tech and mobile industries have jumped in and are trying to create these new payment methods and hence capitalize on these new revenue streams [1].

These new payment methods have made their way into virtually every industry and there are very few, if any, things one can only pay for with cash. Here in Sweden cash usage is decreasing and the percentage of cash transactions in shops fell from 39% in 2010 to about 20% in 2014 [8]. Making sure all these revenue channels are open is vital for every business. Nowa- days one can find many retailers that accept mobile transfers such as SWIFT transfer for payment. This is becoming more and more common with the ever increasing penetration of smart-phones(73%) and tablets(53%) [7]. An- other major channel for retailers is of course the internet. The internet has provided retailers with the means to grow and expand faster than ever, this means that a secure and easy way for consumers to pay online is necessary.

Furthermore it is crucial that retailers offer a wide array of payment methods. As can be seen in the table 1.1 below, taken from [7], not offering a popular payment method might discourage a visiting client from becoming a paying client.

(10)

Chapter 1. Introduction to the Problem 2

FIGURE1.1: The table above taken from [7] showing survey results detailing reasons online shoppers stop a purchase

These new possible and arguably necessary ways to pay have opened up a market for independent Payment Solution Providers(PSPs) other than just the traditional financial institutions such as banks. For PSP’s it is important to be better than banks in a number of parameters like speed, security, price and convenience [1].

A major advantage gained when using a PSP is a lot of time saved in building the system as a merchant can tap into an already developed plat- form which is connected to many payment channels and usually in multi- ple currencies. These technical connections are usually managed and main- tained by the PSP itself. A merchant using a well established PSP is also given the advantage of familiarity for the consumer. This will make a customer feel more secure while shopping online. Other services offered by PSPs are fraud protection, reporting. PSP’s also provide the merchants clients with the possibility of consumer credit. For these services PSPs usually charge their merchants either a fixed cost per transaction or as a percentage of the total transaction. However apart from being a payment gateway for the merchant they usually also offer the merchant’s customers different repayment options. While the merchant will in most cases be transferred the money instantaneously the PSP might give the customer flexibility on when to pay for the purchase or perhaps the ability to spread the payment over a number of months. For this service the PSP might opt to charge customers an interest on the purchase depending on the market.

In this project, a problem that a well established anonymous PSP in the Swedish market has suggested based on the structure of their own operations will be tackled. The operations of this PSP are structured in a way that ensures a smooth process for both the consumer and the merchant. They have entered an agreement with the bank and will be able to finance the merchants 100% of each transaction. Hence rather than simply being a payment gateway, purchases by registered customers will be paid for from a common account which will be referred to as the "loan book". Hence all executed transactions from all merchants and across all channels must be funded from this loan book. Once the customer makes a repayment, this received transaction also goes into the loan book. Intuitively this loan book is financed by loans from the bank in which it resides. This way the risk

(11)

of defaulting on a purchase will be shifted on to the PSP rather than the merchant(depending on the agreement between merchant and PSP). This is another selling point by the PSP to the merchant.

It is vital that there is enough liquidity in the loan book to cope with all of the executed transactions. Otherwise the PSP risks fees from the bank and perhaps even interruptions to the service, which would mean severe consequences to the brand. As mentioned this account will be financed through a loan from the bank in which it resides. One can easily see the fine line that needs to be walked in order to ensure that there will always be enough liquidity to finance executed transactions but at the same time minimize the costs associated with taking the loan. For this reason it is very desirable to have a model that will forecast both the executed and the received transactions during a certain period. With this forecast better decisions can be taken on how much liquidity to have in the loan book at a given time. In particular they would like to forecast the liquidity in their loan book in the next day, week, month quarter.

This problem could be formulated as a stochastic optimization problem on when and which loans to take with the aim of minimizing the costs of borrowing while keeping the probability of running out of liquidity below a certain threshold. To do it this way one needs to build a probability model of the liquidity needs over time as well as get a list of all the possible agreements that can be made with the bank for financing. This is by no means easy as one needs to know not only the expected value of the loan book at a given time, but also have a fairly accurate probability distribution for the liquidity in the loan book at different points in time.

This approach was not pursued due to unwillingness of the PSP in ques- tion to divulge the confidential agreements with the bank. Moreover it seems that the options of such agreements might evolve over time. Instead the focus was to create good forecasts for the executed and received transactions in the loan book that would then be provided to treasury manager and chief financial officer as projections for review and assessment.

Another approach to this problem suggested by the aforementioned PSP is to attempt to build a probability distribution for the repayment be- haviors of customers. This distribution would then be used to generate simulations and estimate the liquidity of the loan book during a particular period by estimating when open debt will be repaid. One must note that while this will be helpful to forecast the received transactions, this alone does not suffice as one must also forecast the executed transactions.

This method requires an in depth study of customer behavior by attempting to find variables that influence how customers repay their debts.

Possible variables are age category, earning bracket, gender and credit score.

After partitioning the customer set into smaller sets using these variables, a probability distribution for repayment times could be found or rather simulated for each. Techniques such as Markov Chain Monte Carlo can be used to efficiently simulate from these distributions without actually finding these target distributions. For accuracy these distributions need

(12)

to be conditioned on the amount of debt owed by the customer as this will most certainly have an effect. This technique coupled with an appropriate method for forecasting executed purchases is promising and might give further insights about different customer sets. This study would be useful for a PSP as it can give beneficial information such as which customers are more likely to default or more likely to incur late payment fees.

This method was also not pursued as due to confidentiality agreement issues it was not possible to obtain the large volume of data required from the PSP to conduct such a study. Instead discussions were held and an idea of the volume of transactions that occur daily was acquired. This was complemented with an explanation of different factors that influence the daily amount of transactions occurring as well as their size. Using this information, simulated meta-data was generated that would fairly accurately depict the aforementioned PSP’s data. The only information generated for these transactions is the time of execution, size of transaction and the campaign chosen. This is elaborated on further in Chapter2.

Given the data limitations the goal of the project was to create a robust algorithm that can be easily adapted to provide good forecasts for different historical data. An approach by [6] to estimate the intraday liquidity risks for different financial institutions was found to be transferable to the problem at hand. In that paper the author is attempting to create an algorithm that would help in improving liquidity risk management of different financial institutions that would hopefully provide a basis for regulations in that field. In León [6] the author considers only the volume and size of the transactions and from which financial institution they originate from.

No importance is given to what the transaction was for or which client executed it. Since it is the goal of this project to create and algorithm for equally unspecific data the algorithm in [6] was used as a basis for the one for this project. There are some important differences between the problem being tackled by [6] and the one in this project, mainly in the relationship between the executed and received transactions, the fact that in this case there is only one financial institution and the time frames we are interested in.

In this project the algorithm simulates trajectories of the executed and received transactions and then uses the average of these trajectories as estimates for how much liquidity is in the loan book over a number of time frames. This algorithm is explained in detail in Chapter3.

(13)

5

Chapter 2

Generation of Simulated Data

For the algorithm in this project, historical data is needed. While no real data was provided, lengthy talks were held with a relatively new, well established PSP. Using information gathered from these meetings a rough idea of how real data would look like was obtained. An attempt was made to simulate this data, in terms of volume and behavior.

Before starting the generation of this simulated data a few things needed to be decided on:

• An average for the amount of daily purchases

• The trend

• The seasonal effects

• Distribution of the size of purchases

• Repayment options offered

No data about the volume of purchases was acquired due to the sensi- tive nature of that information. To make this project as realistic as possible a fairly large number of daily transaction was chosen. This averaged out to about 6000 daily executed purchases. The size of the purchase depended on what repayment option was chosen. There is assumed to be a relation between the size of the purchase and how the buyer decided to pay it back.

This is elaborated on further when describing these repayment options. The data was generated from the 1^st of January 2014 up the 25^thof April 2016.

2.1 Choice of Trend and Seasonality for Executed Trans- actions

For the trend factor a relatively fast growth was simulated. This was thought to better represent liquidity challenges faced by a PSP in managing it’s loan book. The trend line chosen was a linear one representing a 15% growth in the number of daily transactions. Added to this is, a random number of daily transactions that was generated using a truncated normal distribution.

A PSP may offer its services to various industries. Since we will not stratify the data into different industries, which industries are serviced is of

(14)

Chapter 2. Generation of Simulated Data 6

no importance for the scope of this project. Rather what is of more importance, is the behavior of the overall monetary sum of purchases.

It was assumed, as was related to the author, that there will a low season in the first four months of the year, average season during the middle four months of the year and a high season during the last four months which peaks around Christmas time.

It was noted that there are holidays that are particularly consumer ori- ented. These give rise to local peaks. For the purposes of this project the holidays that were considered are listed below in ascending order with re- spect to the size of their effect.

• Halloween

• Mother’s Day

• Father’s Day

• Valentine’s Day

• Christmas /New Year

The effect of each of these holidays was simulated by adding a gradu- ally increasing number of purchases to the days leading up to the holiday.

This increase was also a truncated normal with varying parameters that are clearly explained in the MATLAB code in Appendix C. With exception to Christmas, this effect ends abruptly the day after the holiday. The added number of purchases were simulated using a truncated normal distribution with a positive mean that increases depending on the proximity to the holiday.

Another monthly effect which is noted by people in the industry is that of peaks around the pay day in repayment of purchases. In Sweden there is significant harmony regarding when employees usually receive their pay- check, which is the 25^th of every month. This leads to significant effect which is explained in more detail in the next section.

2.2 Repayment Options

When one uses a payment solution company, one usually has several options one can opt for in order to pay for the purchase. Each option comes with it’s own structure and fees. For the scope of this project five repayment options were simulated. One of which is receiving an invoice at home and the others are referred to as "campaigns" where the customer can repay the purchase by transferring equal lump sum transfers for a number of months.

• Invoice Repayment

– The customers will receive an invoice to their registered address within 10 working days and then pay for the whole purchase via a credit card. The customer is given 10 working days to repay

(15)

the purchase from the arrival of the invoice. The purchase sizes using this repayment methods were generated using a truncated normal, ∼ N_[0,10000](1000, 400)

• 3-month or 6-month Campaign

– The customer will receive monthly notifications to pay a fraction of the full amount. The customer is given 10 working days from each notification to pay. The purchase sizes using these repayment methods were generated using a truncated normal, N_[0,10000](1500, 400)and N[200,10000](2000, 400)respectively.

• 9-month or 12-month Campaign

– The customer will receive monthly notifications to pay a fraction of the full amount and 2% interest on the outstanding balance.

The customer is given 10 working days from each notification to pay. The purchase sizes using these repayment methods were generated using a truncated normal, N[500,10000](4000, 400) and N[1000,10000](5000, 400)respectively.

After the volume of purchases per day is created and an amount for the purchase is chosen each purchases is associated with one of these five repayment methods.

The distribution of the purchase size for each repayment method was different as larger purchases are more likely to be paid over a longer period.

The received transactions in this project arise from executed transactions. It was assumed that every purchase will be paid back and the customers will stick to the conditions of the repayment method they have chosen.

In the case of invoice payments customers are assumed to receive a notification asking for payment ten working days after the purchase which they then pay back within fifteen working days.

For all other repayment options the customers are assumed to receive notifications asking for the monthly installment every twenty working days which they then pay back within fifteen working days. It is assumed that the number of defaulting customers is insignificant hence every purchase is repaid in full. This assumption is somewhat justified in the company analy- sis by REDEYE where it is stated that in Scandinavia non-performing credit amounts to less than 1% [7].

Since when the received transactions will arrive depends on the execution day of the purchase and the repayment conditions of the respective repayment method there is some seasonality that arises from the executed transactions. However it is intuitive to see that the payday will also affect the received transactions. It was assumed that customers were more will- ing to pay in a date which closer to the payday if they could. Hence some repayments were delayed up a few days after the next pay day.

(16)

FIGURE2.1: Daily Executed and Received transactions

Figure2.1shows the simulated historical data generated for this project.

While the volume of transactions starts out low there is significant growth which is to be expected of a relatively young company that has established itself well on the market. Christmas time is biggest seasonal effect on the executed purchases and the pay day is by far the the biggest effect on the received transactions. The data simulated follows the trends and proper- ties described by people working in the industry. One property which was not included in the simulated data was that of irregular jumps that arise when new merchants are added on to the system. These jumps were hard to quantify and simulate without hard data.

(17)

9

(18)

Chapter 3. The Algorithm 10

Chapter 3

The Algorithm

FIGURE3.1: Algorithm Procedure(flow chart).

(19)

The algorithm that will be presented here is an adaptation of the one given in [6]. The goal of the algorithm is to forecast the position of the "loan book" over the next T days. This will be achieved by simulating, Q times, the executed and received transactions over the next T time frames for each repayment option(each time frame will represent a day). From these simulations we will make our predictions.

The algorithm simulates executed purchases and their monetary value using a bootstrap simulation. Unlike the algorithm in [6], the executed and received payments are not generated using a bi-variate Poisson process because in the scenario being presented in this project, the received payments depend directly on(and not simply correlated to) the executed purchases.

In other words every received payment can be directly traced to a past executed payment. Moreover the total sum of received transactions is predetermined by the number of executed transactions and the repayment method chosen. If the assumption of no defaulting customers is to be relaxed then on can still use this number of received transactions as a guideline or an upperbound. Had the received and executed transactions been generated using a bi-variate Poisson process the algorithm would have ignored valu- able information that is found in the historical records and the relationship between the executed and received transactions. This also means that every time frame cannot have it’s own independent simulation as received payments in a certain time frame depend solely on executed purchases in some previous time frame. Hence each simulation incorporates all the desired time frames together.

3.1 Implementation

The implementation of this algorithm was done using MATLAB. The algorithm starts by first retrieving data from the payments database. This data will include all incomplete repayments, along with the time elapsed since their last installment, and vectors containing repayment times for received transactions. The model centers around Monte Carlo simulation of the executed and received payments for each repayment option that will happening the next T days. Each day is considered as one time frame. For each repayment option Q simulations are conducted for all the time frames being considered. The average of these simulations for each time frame is taken to be the forecast for executed and received payments for that repayment option.

As can be seen in Figure3.1the first step when simulating executed purchases is to generate a Poisson process using intensity λx,t. Hence having a good estimate for these intensities will be an integral part of this project.

(20)

3.2 Estimating Intensities λx,t

The intensities in León were estimated using a standard maximum likeli- hood estimate for each (relevant) hour of the day. In this project the intensity would represent the expected number of executed purchases each day.

This presents some further challenges due to the nature of the data.

The data contains weekly, monthly and seasonal components so it does not suffice to average for instance over all Mondays as certain Mondays might be holidays. It was decided that a time series model would be a better fit onto the daily number of executed transactions. This will capture trends and seasonality in the data and make more specific predictions about the volume of transactions expected on certain days. How this was achieved is explained in detail in AppendixB

3.3 Simulation of Executed Purchases

The simulation starts at (a), by first simulating the arrival of executed pur- chases throughout our time frame using the intensities estimated in the previous section. As explained in AppendixB the residuals of the model fitted to these intensities may or may not include an ARMA model. In the data used for this project the residuals were tested for normality using the Jarque-Bera test and for all repayment methods there was no evidence to reject the null hypothesis.

In the paper by [6] the number of transactions were generated using a Poisson arrival process with the estimated intensity for the particular time frame. A Poisson process is the natural choice as it returns integers as is nor- mally used for such situations. A comparison was made between using the Poisson distribution, with intensity λx,t, and the normal distribution, with µ_xand σxbeing estimated from the residuals. In the case of the normal distribution the random number of transactions was calculated by dλx,t+ re where r ∼ N (µx, σx).

One thousand random integers were generated using both methods and their distribution is depicted in Figure3.2. It is clear that the Normal Distri- bution(with the ceiling function) has slightly heavier tails than the Poisson and this is seen as more desirable for this project as it will include more extreme cases. Hence this method was chosen for the simulations of the number of executed purchases. This can be done easily using the inbuilt func- tions in MATLAB. The way the executed purchases arrived throughout the day is of no interest, only the total amount of executed purchases within the day if of interest. Hence rather than generating a minute-by-minute vector of purchases, like the author of [6] does, only the total number of arrivals was generated. This relieves us of having to make assumptions on what the hourly intensities of the arrival process are.

(21)

FIGURE3.2: The figure above shows the histograms of random numbers generated using Poisson Distribution and

Normal Distribution

The next step (b), is to attach a monetary value to these executed trans- actions. This is done using bootstrap simulation¹by drawing with replace- ment from the historical records for executed purchases. Using this method for generating a monetary value we avoid making any assumptions about the data, such as normality [3].

3.4 Simulation of Received Payments

After having matched the executed purchase with a monetary value, the received transactions that will arise from this purchase will be simulated in (c). According to which repayment option we are simulating we know the number of received transaction that will arise from this executed transaction.

However when the received transactions arrive is not set in stone and bootstrapped simulation is again used by drawing from vectors of historical repayment times for each scheduled payment. These received payments are then added to the tally of future time frames in which they will arrive.

If it happens that this day lies beyond our last forecasted time frame T , then we do not use this received payment.

Before the simulation of executed purchases and the received payments that will arise from them the repayments were simulated for the real pending repayments transaction. Bootstrap simulation is used as explained above and in AppendixA, this time conditioned so that the repayment time chosen is bigger than the time already elapsed. These received transactions are then also added to the tallies of the respective time frame.

1Bootstrappping simulations explained in more detail in the AppendixA

(22)

14

Chapter 4

The Forecast

The forecast is the final step of this algorithm, the simulations generated are used to obtain a Monte Carlo estimate for the executed and received transactions in each time frame up the the chosen time frame T . After summing up the transactions from each repayment method a Monte Carlo estimate for the sum of executed and received transactions, ¯E_tand ¯R_t, is obtained by dividing by the number of simulations Q.

E¯t= 1 Q

Q

X

k=1 5

X

x=1

E_k,x,t t = 1, 2, ..., T (4.1)

R¯t= 1 Q

Q

X

k=1 5

X

x=1

Rk,x,t t = 1, 2, ..., T (4.2)

Where Ek,x,tis the sum of executed purchases for repayment method x in time frame t for simulation k. Similarly for Rk,x,t.

The vectors ¯Eand ¯Rare the forecasts for produced by this algorithm for each time frame. The algorithm is somewhat expensive computationally hence antithetic variates were used to double the number of simulations at no great cost with regards to computation time. The method of antithetic variates reduces variance by creating negative dependencies between pairs of replications [4].

4.1 Results

The algorithm described above was applied to the simulated data in Chap- ter2and a forecast for next 365 days was obtained. A forecast of the whole year was generated to see how the algorithm performs when trying to capture the different features of the time series throughout the year. The result can be seen in Figure4.1. The algorithms’ projections follow the same pat- tern the historical data does and the received transactions seem to peak at the right time, coinciding with the pay days.

(23)

Chapter 4. The Forecast 15

FIGURE 4.1: The figure above shows the forecast for the next 365 days.

The original goal of the algorithm was to get a forecast for shorter time frames such as the next three months hence more focus was placed on the first 97 forecasted time frames, from the 26^thof April to the 31^st of July. A closer inspection of this period is shown in Figure4.2. If one looks at the historical data it is clear that the peaks for the received transactions always happen on the pay day. In the three month forecast shown in Figure4.2the peaks sometimes happen a day late although as can be seen from Figure 4.1the rest of the peaks happen on the pay day¹. While this might lead to miscalculation of the liquidity needs on the daily basis it does not lead to bad estimates for the monthly liquidity needs. The reason behind this late peak is explained in the concluding remarks in Chapter5.

1There is one exception to this statement, March’s pay day, and it is explained in the concluding remarks

(24)

FIGURE 4.2: The figure above shows the forecast for the next 100 days from the 26^th April 2016 to the 4^thAugust

2016

4.2 The Loan Book

Having obtained these simulations and forecast they can now be used to tackle the main problem in the project, that of projecting the liquidity needs.

In Figure 4.3 the projected liquidity needs for the next forecast year are shown. One can see that at the beginning of each calender year there actually is growth in the loan book resulting from repayments of a large number of purchases executed during the Christmas season as well as the beginning of the year being a low season for purchases. One can also note monthly peaks of growth in the loan book resulting from many repayments being done close to the pay day.

(25)

FIGURE4.3: Loan book liquidity needs for the next 365 days

In Figure4.5the liquidity in the loan book is shown from the 1^stof Jan- uary 2016 up until the 31^stof July 2016. It is assumed that the account had an opening balance of 65, 000, 000kr and by the 25^thof April this stood at around 95, 000, 000kr. One thousand simulations were generated to simulate the loan book balance until the 31^st of July. The balance is expected to stay positive but in some simulations the balance dipped below zero which is something a planner wants to avoid. Using these simulations one can estimate the risk that the loan book balance will dip below zero.

Let Bk,tdenote the balance of the loan book for simulation k ∈ {1, ...Q}

at time t ∈ {1, ..., T } and Bt be the actual loan book balance during this period. Then the probability that during this period the loan book will run out of liquidity can be estimated by:

P [ min

1≤t≤T(B_t) < 0] ≈ 1 Q

Q

X

k=1

1min_1≤t≤T(B_k,t)<0 (4.3)

In the case depicted in Figure4.5 the probability of running out of liquidity by the 31^stof July was found to be 0.3%.

(26)

FIGURE4.4: The figure above shows the balance in the loan book for the next 100 days from the 26^th April 2016 to the

4^thAugust 2016

If these projections are used to forecast the liquidity needs on a monthly basis they be used to quantify the cost different strategies of funding the loan book. As explained in Chapter 1 the loan book is financed through loans from the bank. Hence minimizing the costs for loaning money is one of the planer’s goals.

Figure4.2was used as an example to compare different expenses that arise from different financing strategies. It was assumed that the balance in the loan book on the 26^th of April is all acquired through a loan of 9, 570, 000. Through the simulations we know that if this is increased to L = 99128000that this balance will finance the loan book until the 31^st of July with a probability approximately equal to 100%. If it is assumed that repayment does not start within this period the costs associated with this loan can be calculated as follows:

C1 = L Z T

t=0

e^rt− 1 dt (4.4)

On the other hand if the projections are used and if it is assumed that the planer can take smaller loans at the same interest rate and under the same conditions, one can make strategic cash injections to finance the loan book during the same period. The amount to be injected and when was found using a worst case scenario approach from the simulations. The result was three loans which sum op to L and are described below:

(27)

L₁= 42, 545, 000 on the 10^thMay (T1) (4.5) L2= 25, 036, 000 on the 15^thJune (T2) (4.6) L3= 31, 547, 000 on the 13^thJuly (T3) (4.7) Hence the cost can now be adjusted to:

C₂ = L₁ Z T

T1

e^rt− 1 dt + L₂ Z T

T2

e^rt− 1 dt + L₃ Z T

T3

e^rt− 1 dt (4.8) Using the second strategy will lead to savings of:

C₁−C₂= L Z _T₁

0

e^rt− 1 dt+(L₂+L₃) Z _T₂

0

e^rt− 1 dt+L₃ Z _T₃

0

e^rt− 1 dt (4.9)

FIGURE4.5: The loan book using the second strategy

(28)

20

Chapter 5

Conclusion

5.1 Summary

In this project an algorithm for tackling the loan book forecasting problem was successfully created and implemented. The forecasts obtained fit well to the historical data simulated and the computation time required for a significant number of simulations, though somewhat long, is still feasible especially with better hardware. The algorithm can be further sped up if the code is slightly redesigned and where possible parallel for-loops are implemented.

In Chapter 4 an example of how the results obtained can be used to make decisions about financing of the loan book was described. This was the original goal of the problem and under the assumptions taken about the bank loans the new strategy presented can lead to significant savings.

The model described relies heavily on a good time series fit to the number of transactions being executed. This might prove to be difficult if there is not enough historical data to work on though this will not likely be the case in a modern well established tech company. In this project five time series models were fit onto the number of transactions for all the different repayment options. In a real life scenario this number will most likely be higher as it might prove fruitful to stratify the data further according to the merchants or groups of merchants. The reasoning behind this is that transactions arising from different merchants may have very different monetary values and this needs to be taken into account when using the bootstrap simulation technique.

The bootstrap simulation technique requires large pools of historical purchase data. Luckily in a PSP this will not be a problem to obtain and the effectiveness of this technique has been well demonstrated in Chapter 4 and AppendixA. The simplicity of the technique and its ease of implementation is what makes it so attractive. Computer memory might be an issue depending on the amount of data being used. For this project even though a significant amount was being used, lack of memory was never an issue. In cases were computer memory would be an issue one can always implement sampling techniques to minimize the size of the data set needed.

Sampling with bias to "worst case" scenarios can help with ensuring the liquidity needs will not be underestimated.

(29)

Chapter 5. Conclusion 21

In Chapter4it was described how bootstrap simulation is used to generate the monthly pay day effect found in the received transactions. While this was very effective there are some months where this effect peaks a day or two late. The reason behind this slight miscalculation is because the number of days between salary dates differs slightly between months, the most extreme case being February. In Figure4.1we can see how for March the peak comes two days late since the repayment time is sampled mostly from months which have 30 or 31 days. This could easily be fixed by further filtering of the data set when sampling. The further filtering would determine whether the original executed purchase happened on a month with 28, 30 or 31 days. Though this might make the data set significantly smaller one should note that since there are about 6000 purchases are executed daily the sample size will always be significantly large.

5.2 Future Work

As mentioned time series was central to the quality of the simulations generated. It is paramount that any future work on this builds the best possible time series fit to the number of transactions in each strata being considered.

In this project the trend was estimated using a least squares polynomial fit or order one. This fit was very effective on the simulated data, as explained in Appendix B. When choosing the order of the polynomial fit all prior knowledge about the operations of the PSP should be taken into con- sideration. For instance if an aggressive marketing campaign is about to start, or if many new merchants are going to be added. This might influence the trend and perhaps a higher order is necessary.

Improvements to the fit can be made by considering more than one seasonal period. In this project the seasonal period was that of one year. While for the purposes of this project this has worked well, there might be cases where more than one seasonal period is needed. Also the seasonal components were assumed to be constant from year to year. While this might most certainly be the case, there are arguments to be made that the seasonal component might also be growing over time. Successful retailers focus the money spent on marketing during the peak seasons and might get better at it year to year making the effect of certain holidays more pronounced.

Due to data limitations for this project the executed transactions were not associated with any customers. The customer profile might provide further insights into when a transaction is paid. Also one customer might have more than one purchase at any given time which would mean the repayment times of all transactions by the same customer are correlated.

Further work on this study must incorporate that useful information.

For pending transactions whose repayment needs to be modeled the bootstrap sampling could be further conditioned on the repayment time of previous installments. For instance finding the repayment of the third installment given the knowledge of the first two.

(30)

22

Appendix A

Bootstrap Historical Simulation

This technique was used in two instances in the algorithm. The robustness of this technique and the simplicity in its implementation is what makes it so appealing. This procedure can be used to simulate the distribution of a data set without the need to make assumptions on the data set.

The first instance Bootstrap simulation was used, was to obtain a mone- tary value for simulated executed transaction. After step (a) in figure3.1an integer for the number of transactions happening during that time frame is obtained. After loading the historical data about the cost of purchases done and filtering the data set for the repayment method currently being simulated, Bootstrap simulation is applied and a monetary value is then associated with each purchase. The sum of all these executed transactions is the goal of the simulation and will be the cash flowing out from our loan book. This is done for each repayment method.

As has been described already, every received transaction arises from exactly one particular executed transaction. For each repayment method in the project there are a predetermined number of received transactions that will happen. The monetary value of each of these is also known and is a fraction of the executed transaction plus any fees where applicable. What remains is to simulate when these repayments are done. This needs to be done for both executed transactions that have been simulated during this time frame as well as all the real purchases that have not yet been repaid in full. These repayment times were simulated using bootstrapping by loading historical records that contain the time elapsed until each respective repayment. This method was effective and would have produced good results if the goal of the project was to forecast liquidity needs over long peri- ods. However, as can be seen in FigureA.1, it failed to capture the monthly pay day effect in the received transactions. It is vital that this monthly effect is simulated accurately to make good predictions about the liquidity needs in the short term.

(31)

Appendix A. Bootstrap Historical Simulation 23

FIGURE A.1: The forecast using bootstrap simulation for the repayment times.

It seems at first that this would mean that one has to find a distribution of the percentage of received transactions during each individual month(

or at the very least for months off different lengths). Using this one can then somehow generate repayment times that would follow this distribution. This option was found to be difficult and complex as one has to make sure that the repayment times do not break conditions in the repayment option.

A far simpler and more effective method was found. Recall that the pay day effect is a result of people who receive their notification to pay on a certain day in the month but choose to wait until their next pay check before paying the monthly installment. Hence this effect is more pronounced for repayments where the notification is received in the middle of the month compared to repayments where the notification is received close to the pay day. As a result this effect is dependent on the day in the month on which the purchase was executed, as when repayment notifications are received depends only on the day of purchase. So to simulate this effect bootstrapping simulation was used but the historical data set was not only filtered by repayment method, but also by which day of the month it was executed on.

As an example to simulate the repayment of a simulated purchase of 1000kr that was executed on the 3^rdMay 2016 that will be repaid using the three month campaign, the algorithm loads the historical data set that contains the repayment times for the first, second and third installments of all past transactions that used the 3 month campaign as a repayment method.

So far this method will provide us with the forecast in FigureA.1. To simulate the payday effect, the data set is filtered further to transactions that have happened on the 3^rd of any month. Due to the large volume of executed transactions happening daily there is always sufficient data to sample from. While this technique is expensive when it comes to memory the computation time is not long and the result can be seen in FigureA.2.

In the cases of real purchases that have not yet been repaid the same procedure was used however the data set was filtered even further and the sample was conditioned to be greater than the number of days already elapsed since the purchase.

(32)

Appendix A. Bootstrap Historical Simulation 24

FIGURE A.2: The forecast using bootstrap simulation for the repayment times. The data set was filtered according to

the day of execution.

(33)

25

Appendix B

Time Series Estimates for λ_xt

B.1 Elimination of Trend and Seasonal Component

FIGURE B.1: Transactions vs Day for Invoice Repayment Option

Above is the number of transactions for the invoice repayment method in each day in our historical data. The repayment method is the only category by which the data was stratified. Hence a time series model is needed for each. In the case of this project because of the way the data was generated the models fitted will be practically the same for each category.

An approach taken from Dowd was applied. A first glance at our data shows no irregular jumps or discontinuation but clear signs of seasonality and a linear growth. Before fitting a model the data was de-trended and de-seasoned using adapted algorithms from section 1.5.2 in [3]. One can find these in deT rendSeason.m and deSeason.m.

(34)

Appendix B. Time Series Estimates for λxt 26

A classical decomposition was chosen for the data.

X_t= m_t+ s_t+ Y_t, t = 1, ..., n (B.1)

where E[Yt] = 0, (B.2)

s_t+d= s_t, (B.3)

and

d

X

i=1

sj = 0 (B.4)

A standard algorithm to obtain de-trended and de-seasoned data goes as follows:

• Apply a moving average filter to estimate the trend ˆmt.

• Estimate the seasonal component for k = 1, ..., d by finding the average of the deviations,

w_k= ¹

b^n−k_d c

P^b

n−k d c

j=0 (x_k+jd− ˆm_k+jd).

These deviations might not sum to zero so we get out seasonal component by subtracting the average,

st= w_k−¹_dPd

i=1wiand st= s_t−d, t > d.

• The finals step is to re-estimate the trend by fitting a least squares polynomial, mtto the de-seasoned data dt= xt− s_t. An appropriate order to the polynomial should be chosen. Our residuals are then defined as Yt= xt− m_t− s_t

From empirical evidence however it proved that replacing the first trend estimate with another least squares polynomial fit to the original time series yields far better results in terms of obtaining Gaussian residuals. This empirical evidence is shown below by comparing the fits without the residuals and the histogram plots for the residuals using both methods.

FIGUREB.2: Figures showing the fit using MA and LS polyfit for the 1st trend estimate respectively

Immediately from the figures above and below one can say that the fit obtained using least squares polynomial fit for both estimates of the trend yields a better result. This can also be quantified in terms of the variance obtained as using the moving average method gave a variance of 6.58 × 10³ while using least squares method gave a smaller variance of 1.99×10³. Also the residuals appear to be more Gaussian with lighter tails.

(35)

Appendix B. Time Series Estimates for λxt 27

FIGUREB.3: Figures showing histograms for the residuals obtained using MA and LS polyfit for the 1st trend estimate

respectively

B.2 Testing the Residuals

The aim of the previous section was to achieve a stationary residual time series. Once this has been done successfully the residuals must be tested to see whether there is any dependence among them. If there is no dependence then they can be considered as the result of independent random variables and we need not model them further ([2]). If this is not the case then a more complex stationary time series model needs to be fit onto the data.

A number of tests are described in [2] to check the hypothesis that the residuals are indeed observations of independent and identically distributed random variables. Judging by the histogram of the residuals above one would suspect that these residuals are Gaussian and one can apply tests to confirm or reject this hypothesis. In the case of a rejection an appropriate model should then be fit to these residuals.

As suggested in [2], for large n the Jarque-Bera test described in [5], should be used to determine whether the data is normal. This can be easily computed in MATLAB up to a 99.9% level of confience using jbtest function. Conducting this test on the residuals derived in the previous section results in the acceptance of the null hypothesis and hence no further mod- elling is necessary.

However as the data being used in this project is simulated and not the actual data that this algorithm will be used on, a more robust procedure is described here. Given a rejection of the null hypothesis this procedure will be able to provide a good fit to data which is similar to the simulated data for this project but may have residuals that are not independently and identically distributed. This will involve fitting mutliple ARMA models and then choosing the parameters p and q that minimizes the BIC statistic.

It is important to note that when attempting to fit any time series model one has to examine the data and decide on a course of action to take depending on how the data looks.