Regression-Based Monte Carlo For Pricing High-Dimensional American-Style Options

(1)

Niklas Andersson nian0021@student.umu.se

Ume˚a University

Department of Physics April 7, 2016

Master’s Thesis in Engineering Physics, 30 hp. Supervisor: Oskar Janson (oskar.janson@cinnober.com)

(2)

some derivatives there exists a closed form solution, however the pricing of high-dimensional American-style derivatives is still today a challenging problem. This project focuses on the derivative called option and especially pricing of American-style basket options, i.e. options with both an early exercise feature and multiple underlying assets. In high-dimensional prob-lems, which is definitely the case for American-style options, Monte Carlo methods is advan-tageous. Therefore, in this thesis, regression-based Monte Carlo has been used to determine early exercise strategies for the option. The well known Least Squares Monte Carlo (LSM) algorithm of Longstaff and Schwartz (2001) has been implemented and compared to Robust Regression Monte Carlo (RRM) by C.Jonen (2011). The difference between these methods is that robust regression is used instead of least square regression to calculate continuation values of American style options. Since robust regression is more stable against outliers the result using this approach is claimed by C.Jonen to give better estimations of the option price. It was hard to compare the techniques without the duality approach of Andersen and Broadie (2004) therefore this method was added. The numerical tests then indicate that the exercise strategy determined using RRM produces a higher lower bound and a tighter upper bound compared to LSM. The difference between upper and lower bound could be up to 4 times smaller using RRM.

Importance sampling and Quasi Monte Carlo have also been used to reduce the variance in the estimation of the option price and to speed up the convergence rate.

(3)

derivat existerar en sluten lösning, men prissättningen av derivat med hög dimensionalitet och av amerikansk stil är fortfarande ett utmanande problem. Detta projekt fokuserar p˚a derivatet som kallas option och särskilt prissättningen av amerikanska korg optioner, dvs optioner som b˚ade kan avslutas i förtid och som bygger p˚a flera underliggande tillg˚angar. För problem med hög dimensionalitet, vilket definitivt är fallet för optioner av amerikansk stil, är Monte Carlo metoder fördelaktiga. I detta examensarbete har därför regressions baserad Monte Carlo använts för att bestämma avslutningsstrategier för optionen. Den välkända minsta kvadrat Monte Carlo (LSM) algoritmen av Longstaff och Schwartz (2001) har implementerats och jämförts med Robust Regression Monte Carlo (RRM) av C.Jonen (2011). Skillnaden mellan metoderna är att robust regression används istället för minsta kvadratmetoden för att beräkna fortsättningsvärden för optioner av amerikansk stil. Eftersom robust regression är mer stabil mot avvikande värden p˚ast˚ar C.Jonen att denna metod ger bättre skattingar av optionspriset. Det var sv˚art att jämföra teknikerna utan tillvägag˚angssättet med dualitet av Andersen och Broadie (2004) därför lades denna metod till. De numeriska testerna indikerar d˚a att avslutningsstrategin som bestämts med RRM producerar en högre undre gräns och en snävare ¨

ovre gräns jämfört med LSM. Skillnaden mellan övre och undre gränsen kunde vara upp till 4 g˚anger mindre med RRM.

Importance sampling och Quasi Monte Carlo har ocks˚a använts för att reducera variansen i skattningen av optionspriset och för att p˚askynda konvergenshastigheten.

(4)

1 Introduction

After the financial crisis that started in 2007, the derivatives markets have been much criticized and many governments have introduced rules requiring some over-the-counter (OTC) derivatives to be cleared by clearing houses[1]. This thesis has been performed at Cinnober Financial Technology. Cinnober is an independent supplier of financial technology to marketplaces and clearinghouses. When more and more complex derivatives are brought to the market the requirements of the clearing houses to clear and evaluate them increases. Therefore, the pricing of different derivatives is an important part of the financial industry today. A financial derivative is a financial instrument that is built upon a more basic underlying variable like a bond, interest rate or stock. This project focuses on the derivative called option and specifically the pricing of American-style multi-asset options using regression-based Monte Carlo methods.

1.1 Options

There are two different kinds of options, call options and put options. Buying a call option means buying a contract that gives the right to buy the underlying asset at a specified date, expiration date T, for a specified price, strike price K. This is also referred to as taking a long position in a call option and for this you pay a premium. If one instead takes a short position in a call option you have the obligation to sell the underlying asset at the expiration date T for the strike price K and for this obligation you receive the premium. On the other hand, a long position in a put option gives the right to sell the underlying asset whereas the short position has the obligation to buy the underlying asset, see Table 1.

Table 1: Explanation of the states Long/Short in a Call/Put option.

Call

Long: Gives the right to buy the underlying asset.

↓ premium

Short: Obligation to sell if long position choose to exercise. Put

Long: Gives the right to sell the underlying asset.

↓ premium

Short: Obligation to buy if long position choose to exercise.

The difference between an option and future/forward contracts is that the holder of a option has the right, but not the obligation, to do something whereas the holder of a future/forward contract is bound by the contract. Let’s take a call option to buy a stock S(t) with strike prize K and expiration date T as an example. If the stock price S(T ) exceeds the strike price K in the future time T > t the holder can exercise the option with a profit of S(T ) − K. If instead the stock price S(T ) is less or equal to the strike price K the option is worthless for the holder. The payoff of a call option, C with strike price K and expiration date T , is thus,

C(S(T ), K, T ) = max{0, S(T ) − K}. (1)

Options that only can be exercised at the expiration date T are called European options and options that can be exercised at any time are called American options. There are also options whose value depend not only on the value of the underlying asset at expiration but on the whole path of the asset. The barrier option is an example of this kind of option. It either becomes active or stop being active if the underlying asset hits a pre-determined barrier during the lifetime of the option. A knock- in barrier option becomes active only if the barrier is crossed and knock-out barrier option pays nothing if the barrier is crossed. The payoff of a down-and-in barrier call option with strike

(7)

price K, maturity date T and barrier H is,

C(S(T ), K, H, T ) = 1S(t)≤H· max{0, S(T ) − K}, (2)

where 1S(t)≤H is 1 if S(t) ≤ H at any time in the interval t=[0,T] and 0 otherwise.

There are also options on multiple underlying assets, these are called basket options. The payoff of a basket option is typically a function of either the maximum, minimum, arithmetic average or geometric average of the assets prices, see Table 2.

Table 2: Different payoffs of multi-asset Call and Put options with strike price K and D number of assets.

Type Call Put

Maximum: max{0, max(s1, ..., sD) − K} max{0, K − max(s1, ..., sD)}

Minimum: max{0, min(s1, ..., sD) − K} max{0, K − min(s1, ..., sD)}

Geometric average: max{0, (QD

i=1si)1/D− K} max{0, K − (Q D

i=1si)1/D}

Arithmetic average: max{0, (PD

i=1si)/D − K} max{0, K − (P D

i=1si)/D}

1.2 Pricing Options

Pricing European options is often done using the well known Black-Scholes-Merton model. This model has had a big influence of the financial market since it provides a theoretic price for European options. The importance of the model was recognized in 1997 when the inventors where awarded with the Nobel prize in economics. Pricing American options is harder than pricing European options, in fact the pricing and optimal exercise of options with early exercise features is one of the most challenging problems in mathematical finance[12]. Before you can value this kind of option you need to determine when the option will be exercised. There has been a lot of research in this area trying to determine the optimal stopping strategy and it is one topic that this project will focus on. Regression-based Monte Carlo techniques for determining optimal stopping strategies had a breakthrough with the proposal of Longstaff and Schwartz in 2001 [3]. In their method least square regression is used to determine exercise dates. C. Jonen published his dissertation in 2011 where he instead suggests using robust regression in the regression step. This project will focus on comparing the least square method by Longstaff and Schwartz, and the robust regression method by C. Jonen [4].

As mentioned there is no analytic formula for pricing an American option so one instead turns to numerical methods, like the binomial model or the finite difference method to solve the partial stochastic differential equation describing the price process of the option. These methods work very good for options that have only one underlying asset but if the option is a multi-asset option with two or more underlying assets these methods lose their strength. Another approach to price options are Monte Carlo methods and this is the approach that will be used in this project. Instead of finding a solution of the price numerically we simulate the path of the underlying assets using a stochastic representation and valuate the option from this path. We repeat this procedure many times, calculate the average of all these simulations and use this as an estimate of the option price. An advantage of Monte Carlo is that it is a relatively easy procedure which can be applied to price various kinds of derivatives. The disadvantage is that it is a computational costly method since stochastic paths have to be calculated which implies a slow convergence rate. However the convergence rate O(_√1

n) holds for all dimensions in contrast to numerical methods where the

(8)

two sources of high-dimensionality, the underlying stocks and time to maturity. Therefore, Monte Carlo methods are often the method of choice when pricing American-style basket options[2]. For the most part there is nothing or very little that can be done about the rather slow convergence rate of Monte Carlo, under appropriate conditions quasi Monte Carlo is an exception. We can, however, look for better sampling techniques and variance reduction methods to reduce the variance in the estimation. There are many different types of variance reduction techniques, antithetic variates, control variates, importance sampling, stratified sampling and quasi Monte Carlo are some well known techniques. Depending on the type of option it may vary which method is the best. In this project the use of importance sampling will be examined, this is a more complex method and require some knowledge of the underlying problem but it has the capacity to produce orders of magnitude variance reduction. The effect of quasi Monte Carlo will also be studied.

2 Theory

Before I explain how to use Monte Carlo for financial applications I will start this section by giving the basic theory for Monte Carlo methods in general, this is handled in subsection 2.1. Subsection 2.2 and 2.3 will be about how to model stock prices and how to use regression-based Monte Carlo to determine exercise dates for options. The basic idea of robust regression and the extension to duality methods is explained in subsection 2.4 and 2.5 and I will end this section with some theory of importance sampling and quasi Monte Carlo.

2.1 Monte Carlo

2.1.1 Kolmogorov’s strong law of large numbers

This law is the main justification of the Monte Carlo methods and state that the average of a sequence of iid, i.e. independent identically distributed, variates will converge almost surely to the expected value. Given a sequence of iid variates ζi with expectation

E[ζi] = µ, (3)

define the average as

¯ X = 1 N N X i=1 ζi. (4)

Then this average will converge almost surely to µ, ¯

X −−→ µ,a.s. when N → ∞. (5)

2.1.2 Monte Carlo simulation

Let’s consider the following integral,

θ = Z 1

0

(9)

If the function f (x) is a complicated function and it is hard to find the primitive function F (x) we can instead use Monte Carlo to approximate the integral with ˜θN = N1

PN

i=1f (Ui), where

Ui∼ U (0, 1) are random uniform variables with density,

gU(x) =

1 0 ≤ x ≤ 1

0 Otherwise. (7)

To understand that ˜θN is a estimator of θ lets first show that the expectation of E[f (U )] =

R1 0 f (x)dx. E[f (U )] = Z R f (x)gU(x)dx = Z 0 −∞ f (x)gU(x)dx + Z 1 0 f (x)gU(x)dx + Z ∞ 1 f (x)gU(x)dx = Z 0 −∞ f (x) · 0dx + Z 1 0 f (x) · 1dx + Z ∞ 1 f (x) · 0dx = Z 1 0 f (x)dx. (8)

The next step is to use the Kolmogorov’s strong law of large numbers ˜

θN = _N1 PN_i=1f (Ui) a.s.

−−→ E[f (U )] = θ as N → ∞. (9)

Hence, by simulating U1, U2, ..., UN, evaluating the function at these points and averaging we get

an unbiased estimator of θ.

2.1.3 Central Limit Theorem

Let ζ1, ζ2,..., ζN be iid. random variables with E[ζi] = µ and V ar[ζi] = σ2< ∞, then the composite

variable XN =

1 N

PN i=1ζi−µ

σ/√N converges in distribution to the standard normal as N increases,

XN = 1 N PN i=1ζi−µ σ/√N i.d. −−→ N (0, 1) as N → ∞. (10) 2.1.4 Error estimation

Given the Monte Carlo estimation in (9) the central limit theorem gives

˜ θN−θ

σ/√N i.d.

−−→ N (0, 1) as N → ∞ (11)

where σ is the standard deviation of ˜θN. Since σ is unknown we have to approximate it with s,

s = v u u t 1 N − 1 N X i=1 (f (Ui) − ˜θN)2. (12)

By inserting s instead of σ in (11) we see that ˜θN i.d.

−−→ N (θ,s2

N) this leads to the definition of the

standard error, SE which is a measure of the deviation in ˜θN.

SE = q V ar(˜θN) = s √ N. (13)

We can now create a 100(1 − α)% confidence interval for the estimator ˜θN as

_˜

θN − z1−α/2· SE; ˜θN+ z1−α/2· SE

(14) where z1−α/2 denotes the (1 − α/2) quantile of the normal distribution and z1−0.05/2≈ 1.96 for a

(10)

2.1.5 Advantages of Monte Carlo

The integral in (6) can also be calculated numerically using for example the trapezoidal rule, ˜ θ_Nt.r=f (0) + f (1) 2N + 1 N N −1 X i=1 f (i/N ). (15)

The error from using trapezoidal rule is of order O(N−2) which is better than the Monte Carlo estimate, O(N−1/2), so how come we use Monte Carlo? The strength of Monte Carlo is that it is of order O(N−1/2) regardless of the dimensionality of the problem, whereas the error for the trapezoidal rule is of order O(N−2/d). This degradation in convergence rate as the dimension of the problem increase are common for all deterministic integration methods[2]. Hence, Monte Carlo methods are superior for evaluating integrals in higher dimensions.

2.2 Dynamics of the stock price

In order to use Monte Carlo methods to price options we need to generate paths of the underlying asset, thus stocks. One common assumption in financial applications is that the evolution of the stock price can be represented by a stochastic process, and especially the geometric Brownian motion under the risk neutral valuation,

dS(t)

S(t) = (r − δ)dt + σdW (t) (16)

where, r is the risk-free interest rate, δ the yearly dividend yield, σ the volatility and W is a Brownian motion. A standard Brownian Motion (or a Wiener process) is defined by the following conditions,

1. W (0) = 0.

2. The process W has independent increments, i.e. if r < s < t < u then W (u) − W (t) and W (s) − W (r) are independent stochastic variables.

3. For s < t the stochastic variable W (t) − W (s) has a Gaussian distribution N (0,√t − s) . 4. W has continuous trajectories.

The solution of (16) above is given by,

S(tl+1) = S(tl)e(r−δ−

1 2σ

2_)∆t+σ∆W

, (17)

and due to the conditions 2 and 3 of the Brownian Motion, it possible to simulate the value of a stock at discrete time steps, l = 0, ...L − 1 as

S(tl+1) = S(tl)e(r−δ−

1 2σ

2_{)∆t+σZ(l+1)}

, (18)

where Z(1), Z(2),... are N (0,p∆t). For derivation of equations (16) and (17), I recommend the book ”Arbitrage Theory in Continuous Time” by Tomas Bj¨ork[13].

To generate paths of multiple assets the multidimensional Geometric Brownian Motion is used

dSi(t)

(11)

where r is the risk-free interest rate, δi and σiis the yearly dividend yield and the volatility of the

ith asset Si. X is the d-dimensional Brownian motion with covariance matrix Σi,j = ρi,jσiσj. In

the multidimensional Brownian motion the increments are multivariate normally distributed with covariance matrix Σ, X(t) − X(s) ∼ N (~0,√t − s Σ). Hence, the following equation can be used for generating paths of multiple assets

Si(tl+1) = Si(tl)e(r−δi− 1 2σ 2 i)∆t+ √ ∆tPd j=1AijZl+1,j_, _{i = 1, ..., d} ₍₂₀₎

where Zl= (Zl1, ..., Zld) ∼ N (0, 1d) and A is chosen to be the Cholesky factor of the covariance

matrix Σ, satisfying AAT _{= Σ.}

For more information and algorithms on how to model dependencies between variables I refer to Glasserman [2] or J¨ackel [5].

2.3 Monte Carlo for pricing financial derivatives

In the world of finance there is a big need of calculating expectations and determine future values of different derivatives. The underlying processes are often simulated using stochastic differentials and it is common that if the expectations are written as integrals the dimensions are very large. Depending on the problem the dimensionality is often at least as large as the number of time steps. The strength of Monte Carlo is that it is a very attractive method for solving problems with high dimensionality and therefore the method is widely used in financial engineering.

To price a financial derivative by Monte Carlo one usually follows these steps, • Simulate paths of the underlying assets.

• Evaluate the discounted payoffs from the simulated paths.

• Use the average of these discounted payoffs as an estimate of the derivative price.

However, the early exercise feature of an American-style option makes valuation harder and before we can price an American-style option we need a way to determine when the option will be exercised. When using Monte Carlo we simulate paths of the underlying asset as described in section 2.2 and use these to valuate the option, hence it may seem as an easy problem to determine when the option should be exercised. However, using knowledge of the asset paths and exercising at the optimum is referred to as perfect forecast and tend to overestimate the option price. To better estimate the exercise time one can use parametric approximation to determine exercise regions before the simulation. But in higher dimensional problems the optimal exercise regions can be hard to approximate using this technique [2, P.427].

Another approach to estimate the value of American options is by regression-based Monte Carlo methods. At each exercise time the holder of the option compares the value of exercising the option immediately with the expected payoff of continuation. The value of exercising the option immediately at time t is called the intrinsic value whereas the expected payoff of holding on to the option is called continuation value. The key insight in regression-based Monte Carlo methods is that the continuation value can be estimated using regression. Before I go on and describe the method, let’s begin with some assumptions[4].

• (Ω, F, ˜P ) is a complete probability space, where the time horizon [0,T] is finite and F = {Ft|0 ≤ t ≤ T } is the filtration with the σ -Algebra Ftat time date t.

(12)

• There are no arbitrage opportunities in the market and the market is complete. This implies existence of a unique martingale measure P , which is equivalent to ˜P .

• Bt denotes the value at time t of 1 money unit invested in a riskless money market account

at time date t = 0, i.e. Bt is described by

dBt= rtBtdt, B0= 1,

where rtis the risk free interest rate at time t. Then, Ds,t denotes the discount factor given

by

Ds,t= Bs/Bt, s, t ∈ [0, T ].

In the special case of a constant risk-free rate r, we have Bt= ert and Ds,t= e−r(t−s).

Furthermore we let the underlying asset {S(t), 0 ≤ t ≤ T } be a Markov processes that contains all necessary information about the asset price and we restrict ourselves to the valuation of Bermudan options, i.e. options that only can be exercised at a fixed set of exercise opportunities, t1< t2 <

· · · < tL and denote Sl the state of the Markov process at time step l. In the following the

conditional expectation given the Markov process up until time step l is denoted El[·] = E[·|Stl].

A fair price of an Bermudan option at time t0is then given by the optimal stopping problem

sup

τ ∈τ0,L

E0[D0,τZτ], (21)

where τ0,L is the set of all stopping times with values in {0,...,L} and (Zl)0≤l≤L is an adapted

payoff process. Arbitrage reasoning justifies calling this a fair price for the option[2]. If we let Vl(s) denote the value of the option at tl given Sl= s we are interested in the value V0(S0) which

may be determined recursively as follows VL= ZL

Vl= max(Zl, El[Dl,l+1Vl+1]), l = L − 1, ..., 0.

(22)

This is the dynamic programming principle (DPP) in terms of the value process Vl which is a

natural ansatz to solve the optimal stopping problem in (21). We start at the last time step because we know that the value of the American option at this time step will equal the value of the European option which can be determined.

The dynamic programming recursions (22) focus on the option value, but sometimes it is more convenient to work with the optimal stopping time instead.

τ_L∗ = L τ_l∗ = ( l , Zl≥ El[Dl,τ∗ l+1Zτ ∗ l+1] _{, l = L − 1, ..., 0.} τ_l+1∗ , otherwise (23)

Following the DPP and using a set of N simulated paths {Snl}n=1,...,N,l=0,...,L, the value at each

exercise date for each path can be determined by V_Ln= Z_Ln

Vln= max(Zln, Cl(Snl)), l = L − 1, ..., 0

(24) where Cl(s) denotes the continuation value in state s and at time step l

(13)

for l = 0, ..., L − 1. The idea of regression-based Monte Carlo methods is to estimate a model function for the continuation value via regression. A model function for the continuation value ˆCl

at time step l is given by a linear combination of M basis functions φ(·), i.e.

ˆ Cl(s) = M X m=1 ˆ βmφm(s) (26)

where the coefficients ˆβmmight be determined by solving the least square problem,

min

ˆ β∈RM

||Cl− ˆCl||22. (27)

Numerically the problem in (27) can be solved by

min ˆ β∈RM 1 N N X n=1 (C_ln− ˆCl(Snl))2, (28)

where N is the number of paths, and for n = 1, ..., N we have ˆCl(Snl) =P M

m=1βˆmφm(Snl) and Cln

are the realizations of the continuation value of each path n,

C_ln= e−r∆tV_l+1n = e−r∆tmax{Z_l+1n , ˆCl+1(Sn,l+1)}, (29)

where Zn

l is the payoff at time step l of path n. Following this procedure gives an approximation

of the option price today, i.e.

ˆ V0= max{Z0, e−r∆t 1 N N X n=1 V₁n}. (30)

To estimate option prices through (30) has however shown to give poor estimations and a break-through for regression-based Monte Carlo came with the least square Monte Carlo (LSM) technique, proposed by Longstaff and Schwartz in 2001, [3]. They work with the DPP in terms of the optimal stopping time as in (23), and hence try to determine the optimal stopping time for each simulated path by τn L = L τ_ln = ( l , Zl(Snl) ≥ ˆCl(Snl) , l = L − 1, ..., 1 τn l+1 , otherwise (31)

where the dependent variable Cn

l, in the regression step is determined by

Cn l = e

−r(τn

l+1−l)∆tZ_τn

l+1, n = 1, ..., N, l = 1, ..., L (32)

instead as in (29). Once the optimal stopping time for each path has been determined the option value can be estimated

ˆ V0= max{Z0, 1 N N X n=1 e−rτ1n∆t_Zn τn 1}. (33)

To evaluate ˆV0 above one can either use the same set of paths that determined the stopping

strategy or one might simulate a new set of paths and exercising according to the determined exercise strategy. Longstaff and Schwartz recommend using only in-the-money paths in estimating continuation value, i.e. only paths that can be exercised with a profit will be considered in the regression step.

(14)

2.4 Robust Regression

The idea of robust regression is to take outliers into account once calculating the continuation value in the least square Monte Carlo approach. Therefore the minimization problem in (28) is replaced with min ˆ β∈RM 1 N N X n=1 `(C_ln− ˆCl(Snl)) (34)

where ` is a suitable loss function. Thus in order to implement the robust regression technique in an already running LSM system we only replace the solver for regression. In the following the residuals rn

l will be denoted

r_ln= C_ln− ˆCl(Snl) , l = 1, ..., L − 1, n = 1, ..., N.

The loss functions that will be used in this project are, ordinary least square OLS, Huber and Jonen’s loss functions, see Table 3. To get an intuition of the properties for these loss functions Figure 1 show their graphs.

Table 3: Different loss functions `(·) for robust regression. `(r) OLS r2 Huber ( r2 , |r| ≤ γ1 2γ1|r| − γ21 , |r| > γ1 Jonen      r2 _{, |r| ≤ γ} 1 2γ1|r| − γ12 , γ1< |r| < γ2 2γ1γ2− γ12 , |r| ≥ γ2

Figure 1: Illustration of the different loss functions for robust regression, γ1and γ2 are transition

(15)

The idea of robust regression is to get a better approximation of the continuation value by giving outliers less weight in the regression step. Outliers occur by strongly fluctuated paths and the determination of outliers is not trivial[4, P.21]. The transition points γ1 and γ2 determine which

points are considered outliers and not. The outlier detection procedure recommended by C.Jonen is the following,

rhelp

n = |rn|, n = 1, ..., N, r = (r1, ..., rN)

rhelp_{= sort(r}help₎

γ1= rhelp_{bαN c}, 0 << α < 1

γ2= rhelp_{bβN c}, α < β < 1.

(35)

The assumption is hence that (1 − α)100 percent of the data points are outliers. The reason that this empirical α-quantiles procedure is suggested is because we are not familiar with any distribution of the error[4].

2.5 Duality approach

Andersen and Broadie introduced their Primal-Dual Simulation Algorithm in 2004 [9] which today is a valuable extension to regression based Monte Carlo methods. Their method combined with the LSM method of Longstaff and Schwartz [3] is often the method of choice and implemented in many option pricing systems of financial institutions today[4]. Following the strategy of Longstaff and Schwartz as in (31) usually produces a low estimator of the option price. This is due to that the value of the option V0is given by the supremum in (21) which is achieved by an optimal stopping

time τ∗ hence following the procedure in (31) produces a low estimation of the option price

L0:= E0[D0,τZτ] ≤ V0= sup τ ∈τ0,L

E0[D0,τZτ]. (36)

The idea of Andersen and Broadie’s approach is to create a high estimation to constrict the fair option price V0 between the low and high estimation.

A consequence of the DPP (22) is that

Vl≥ El[Dl,l+1Vl+1(Sl+1)], (37)

for all l=0,...,L-1. This is the defining property of a supermartingale. For more theory on martin-gales I refer to [13]. Due to (37) we can derive a dual problem for pricing options with an early exercise feature as follows[4]

Let H1_{be the space of martingales M = (M}

l)0≤l≤Lfor which sup0≤l≤L|Ml| ∈ Lpwhere Lpdenotes

the space of all random variables with finite p-th moment, p ≥ 1. For any martingale M ∈ H1_{, we}

have sup τ ∈

τ

1,L E0[D0,τZτ] = sup τ ∈

τ

1,L E0[D0,τZτ− Mτ+ Mτ] = sup τ ∈

τ

1,L E0[D0,τZτ− Mτ] + M0 ≤ E0[ max l=1,..L(D0,lZl− Ml)] + M0, (38)

(16)

where the second equality follows from the martingale property and the optional sampling theorem. As the upper bound holds for any martingale M , the option value might be estimated by

sup

τ ∈

τ

1,L

E0[D0,τZτ] ≤ inf

M ∈H1

(

E0

[

_l=1,..Lmax(D0,lZl− Ml)

]

+ M0

)

. (39)

The right hand side of (39) is called a dual problem for pricing options with a early exercise feature. One can show that (39) holds with equality if the optimal martingale is found, to show this one may consider the Doob-Meyer decomposition of the supermartingale (D0,lVl)0≤l≤Lsee Rogers [14].

Finding the optimal martingale appears to be as difficult as solving the original stopping problem, but if we can find a martingale ˆM that is close to the optimal martingale we can use

E0

[

max

l=1,..L(D0,lZl− ˆMl)

]

+ ˆM0 (40)

to estimate an upper bound for the option price. A suboptimal exercise policy provides a lower bound and by extracting a martingale from this suboptimal policy the dual value complements the lower bound with an upper bound. Thus the strategy becomes [4]

1. Find a stopping policy τ with values in {1,...,L} and a martingale ˆM that are optimal in the sense that they are good approximations to the optimal stopping times τ₁∗ and the optimal martingales M∗, respectively.

2. Calculate a lower bound L0 and an upper bound U0 for the fair value of an American-style

option by

L0:= E0[D0,τZτ] ≤ sup τ ∈

τ

1,L

E0[D0,τZτ] ≤ E0

[

max

l=1,...,L(D0,lZl− ˆMl)

]

+ ˆM0=: U0. (41)

The algorithm for this approach and an example of how to find a martingale is described in the section 3.1.2. When pricing high dimensional options it is often hard to calculate any benchmarks values, therefore the method of Andersen and Broadie is often implemented so that we are not in the dark with our calculations. Another reason for using this method is that the difference between the lower and upper bound can be seen as a measure of how good we have estimated the optimal stopping time. The less difference the better estimate of the optimal stopping time. Therefore we can use this method to compare and evaluate the least square Monte Carlo (LSM) with the robust regression Monte Carlo (RRM).

2.6 Variance reduction

Let θ be the parameter of interest, such that θ = E(X) where we can simulate X1, X2, ...XN

and use the average of these as an estimator ˆθN of θ. Then to make ˆθN a good estimator we

need the standard error SE = q

var(ˆθN)/N , to be small. This can be done by increasing the

number of samples, N , but we notice that in order to increase the precision by a factor 2 we need to increase the number of samples by a factor of 4. Another strategy is to work with different variance reduction techniques. We cannot decrease V ar(ˆθN) but we can find another variable

Y such that E[Y ] = E[X] but V ar[Y ] < V ar[X]. There are many different variance reduction techniques that can be used, antithetic variables, control variates, stratified sampling to mention some. The method that will be examined and used in this project is importance sampling which is a more complex method and requires some knowledge of the particular problem but it has the capacity to reduce the variance significantly.

(17)

2.6.1 Importance sampling

The standard Monte Carlo approach will have problem estimating θ = Ef(h(X)) =R h(x)f (x)dx

if it is very unlikely that h(x) is obtained under the probability function f (x), i.e. only in very rare cases we observe h(x). If we instead can find a probability function g(x) that makes it more likely that we obtain h(x) we can observe that,

θ = Ef(h(X)) = Z h(x)f (x)dx = Z _{h(x)f (x)} g(x) g(x)dx = Eg

(

h(x)f (x) g(x)

)

. Hence we can write

θ = Eg(

h(x)f (x) g(x) ),

where x has the new density g instead of f . So instead of using ˆθ = _N1

N

P

i=1

h(Xi), where Xi

are sampled from density f (x) we can use ˆθIS = _N1 N

P

i=1

h(Xi)f (Xi)

g(Xi) as our estimator where Xi are

sampled under g(x). The weight f (Xi)/g(Xi) is called the likelihood ratio and we should try to

find g(x) ∝ h(x)f (x) to obtain large variance reduction using this technique.

In order to select a proper density g, the exponentially tilted distribution density is often used, g(x) = ft(x) =

etx_{f (x)}

M (t) , (42)

where −∞ < t < ∞ and M (t) is the moment generating function M (t) = E[etx]. See Appendix for an example of how to use importance sampling.

2.6.2 Importance sampling in finance

The idea of importance sampling is to change the drift of the underlying assets and drive paths into important regions. We would like to drive the paths in such a way that early exercise is enforced and zero-paths vanish, i.e. paths that lie out of money at maturity. To get a feeling of how importance sampling can be used and the advantages of the method I have created the plots in Figure 2 where the paths in the left figure have a change of drift such that they go down to a barrier and up to strike in 50 time steps. This can be compared to the figure where no importance sampling is used. If we for example want to calculate the price of a down-and-in barrier option it would of course be favorable to use the paths in the left figure.

(18)

Figure 2: Simulated price paths of 10 stocks with importance sampling to the left and without to the right.

Suppose the underlying asset S is modeled through the geometric Brownian motion GBM(µ, σ2₎

with drift µ and volatility σ, we can write this as

S(tn) = S(0)eLn, Ln=P n

i=1Xi, (43)

with Xi ∼ N (˜µ, ˜σ2), where ˜µ = (µ − 0.5σ2)∆t and ˜σ2= σ2∆t and ∆t = tn+1− tn. From this we

know that the probability density function of Xi is

f (x) = 1 ˜ σ√2πe

−(x− ˜µ)2

2˜σ2 . (44)

If we then change the drift ˜µ, according to any proposed strategy and use the drift ˜µ∗, the proba-bility density function for this new drift becomes

g(x) = 1 ˜ σ√2πe

−(x− ˜_2˜_σ2µ∗ )2_. ₍₄₅₎

The likelihood ratio for this change of measure is

n Y i=1 f (Xi) g(Xi) . (46)

Assuming that the expectation Ef exists and is finite, the expectation for all functions h can then

be calculated as Ef[h(St1, St2, ..., Stn)] = Eg

[

h(St1, St2, ..., Stn) n Y i=1 f (Xi) g(Xi)

]

(47)

where Ef denotes the expectation under the original measure and Eg the expectation under the

new measure. For more information and examples on how to use importance sampling, see [2]. To obtain a large variance reduction using this technique we need to optimize the change of drift according to the specified problem, how to optimize the change of drift can be seen in [4]. In this project a simpler approach will be investigated where the paths of an out-of-money option is driven into in-the-money regions. Once the paths are in-the-money they evolve according to the ordinary geometric Brownian motion.

(19)

2.7 Quasi Monte Carlo

Quasi Monte Carlo or low discrepancy methods differ from ordinary Monte Carlo in the way that they make no attempt to mimic randomness. Let’s consider an example of calculating the discounted payoff of an European arithmetic put option with d number of assets,

h(s1(T ), ..., sd(T )) = e−rT · max

(

(K − 1 d d X i=1 si(T )), 0

)

. (48)

The price can then be calculated as,

E[h(s1, ..., sn)] =E[e−rT(K − max(

1 d d X i=1 si(0) · e(r−0.5σ 2 i)T +σi √ T Φ−1(Ui))_{, 0)]} = E[f (U1, ..., Ud)] = Z [0,1)d f (x)dx. (49)

Instead of generating pseudo random numbers Ui∼ U (0, 1) and estimate the integral with_N1 PN_i=1f (Ui),

let’s choose X1, X2, ..., XN carefully to fill the hypercube [0, 1)d evenly and use these to estimate

the integral as _N1 PN

i=1f (Xi). A sequence that evenly fills the hypercube with dimensions d, [0, 1]d

is called a low discrepancy sequence.

2.7.1 Discrepancy and error estimation

Discrepancy is a measure of how uniformly a sequence is distributed in the unit hypercube. Given a collection A of subsets of [0, 1)d_{, the discrepancy of the point set {x}

1, ..., xn} relative to A is D(x1, ..., xn; A) = sup A∈A #{xi∈ A} n − vol(A) , (50)

where #{xi∈ A} denotes the number of xi contained in A and vol(A) denotes the volume of A.

Taking A to be the collection of all rectangles in [0, 1)d _{of the form}

Qd

j=1[uj, vj), 0 ≤ uj< vj ≤ 1, (51)

yields the ordinary discrepancy D(x1, ..., xn). Restricting A to rectangles of the form d

Y

j=1

[0, uj) (52)

defines the star discrepancy D∗(x1, ..., xn) [2, p.284]. Suppose that we fix an infinite sequence

x1, x2, x3... where every subsequence x1, ..., xnhas a low discrepancy. Such a sequence is called a low

discrepancy sequence if the star discrepancy D∗(x1, ..., xn) is of O(log(n)

d

n ). The star discrepancy

is a central part for error bounds of quasi Monte Carlo integration, and an upper bound for the integration error is given by the Koksma-Hlawka inequality. The Koksma- Hlawka inequality says that if f has a bounded variation V (f ) in the sense of Hardy and Krause on [0, 1)d, then for any sequence XN on [0, 1)d the following inequality holds,[2, p.288]

1 N N X i=1 f (Xi) − Z [0,1)d_] f (x)dx ≤ V (f )D∗(X1, ..., XN). (53)

(20)

This inequality can be compared to the error information available in standard Monte Carlo. From the central limit theorem we know that

1 N N X i=1 f (Ui) − Z [0,1)d_] f (u)du ≤ zδ/2· σf √ N, (54)

where Ui∼ U (0, 1). In comparing the standard Monte Carlo error with quasi Monte Carlo it can be

seen that in both cases the error bound is a product of two terms, one dependent on the integrand function f and the other on the properties of the sequence. Both V (f ) and D∗(X1, ..., XN) in (53)

are extremely hard to compute which leads to the fact that the Koksma-Hlawka inequality has limited practical use as a error bound. This leads to problems in comparing the quasi Monte Carlo estimate with a standard Monte Carlo estimate[12, p.15-16].

2.7.2 Sobol sequence

Sobol sequences are a low discrepancy sequence, and it is probably the most used sequence for high dimension integration[12]. For more information of how to create a Sobol sequence and the theory behind it I refer to P.Glasserman[2] or P.J¨ackel [5]. In this section I will instead show why Sobol sequences are a better choice for high dimensional integration than other low discrepancy techniques. The aim of a low discrepancy sequence is to fill a given domain as evenly and ho-mogeneously as possible, however many techniques lose their capability to produce homogeneous sequences as the dimensions increase [5]. To demonstrate why Sobol’s sequence often is preferred before other techniques like Halton’ method I have created Figures 4 and 5 where I have plotted the projection of the first 1000 points of the Sobol’ sequence and the Halton’ sequence onto a two dimensional projection of adjacent dimensions. The result of using a pseudo random number generator is illustrated in Figure 3 and you can see that there is no difference in low and high di-mensions. The result of using Sobol’s sequence is seen in Figure 4 and the result of using Halton’s sequence is visible in Figure 5. One can see that the Halton’ sequence does not fill the domain evenly for higher dimensions while Sobol’s sequence still does.

(a) (b) (c)

(21)

(a) (b) (c)

Figure 4: Sobol’s sequence

(a) (b) (c)

Figure 5: Halton’s sequence

2.7.3 Dimensionality Reduction

As mentioned earlier the dimensionality of the problem can be very large when pricing American-style basket options. There are, however, techniques that can be used to lower the effective dimen-sion of the problem. The Principal component analysis can be used to reduce the number of state variables whereas the Brownian bridge construction can be used to reduce the time dimensions. This is not something that this project focuses on but I will explain the Brownian bridge technique briefly and refer to [2] or [12] for further reading.

The Brownian motion, W , is the driving process when simulating stock paths as in (17). It can easily be generated from left to right by the following recursion

W (ti+1) = W (ti) +pti+1− ti zi+1, (55)

where zi+1is a standard normal random variable and ti+1> ti. The Brownian bridge construction

provides an alternative way of generating the Brownian motion where the terminal nodes W (tN)

of the paths are determined first. All the intermediate nodes are then added conditional on the already known realizations of the paths. Suppose that we know two values of the Brownian motion W (ti) and W (tj), we can then use the Brownian bridge construction to calculate an intermediate

value W (tk) as, W (tk) = tj− tk tj− ti W (ti) + tk− ti tj− ti W (tj) + s (tk− ti)(tj− tk) tj− ti zk (56)

where ti< tk < tj and zk is a standard normal random variable. Figure 6 illustrates the principle

(22)

Figure 6: Illustration of the Brownian bridge construction.

This implies that we can simulate a path of the Brownian motion in any time order. Since many options are more sensitive to the terminal value of the underlying, than of intermediate states, the variance will be concentrated into large time steps. We can then use the first dimensions of the quasi sequence to control much of the generated paths and use pseudo random numbers for the intermediate steps. Hence getting a lower effective dimensionality[12].

3 Method

This thesis has been performed at Cinnober Financial Technology where I have implemented a system for pricing American-style options on multiple underlying assets using regression based Monte Carlo. The aim of the first part of the project was to make a comparative study between the LSM method and the RRM method. The LSM method of Longstaff and Schwarz was implemented with the possibility to replace the solver for regression with robust regression. To compare the results we focus our attention on the duality approach by Andersen and Broadie. The second part of the project focused on convergence rates therefore the possibility to use Quasi Monte Carlo and importance sampling were implemented in the system as part of the investigation.

I have also built an n-dimensional extension to the lattice binomial method according to [6] which has been used for comparison and verification. This method works very good for the one and two asset case but if more assets are used the computational time explodes. For high-dimensional valuation problems, the binomial method is not practically useful.

In all the numerical investigations I have assumed that the underlying assets follows a multi-dimensional geometric Brownian motion as in (16). Monte Carlo algorithms are highly dependent on the quality of the random number generator and creating a pseudo-random number is still today not a trivial thing. I have used the Mersenne Twister, MT19937, generator to produce uniform pseudo random numbers and the transformation to normal random numbers is made through the Box-Muller method. All code has been written in Java which provides already built in methods for generation of both pseudo random numbers and Sobol’s low discrepancy numbers. I could also use built in solvers for least square problems. I did not however find any technique to perform

(23)

robust regression so I had to implement it by myself following the algorithm in [4].

Once working with Monte Carlo and random numbers it is favorable to set a seed in the pseudo number generator and I have used the linear congruential method to produce a seed vector,

x0= (as0+ b)modM, xi = (axi−1+ b)modM

(57) where i = 1, ..., 623, s0= seed = 5, a = 214013, b = 2531011, M = 429496729.

3.1 Algorithms

3.1.1 LSM and RRM

The LSM and RRM algorithm can be summarized as follows

1. Generate N independent paths {Snl}n=1,...,N0,l=1,...,L, according to the multivariate

geomet-ric Brownian motion.

2. At the terminal nodes, loop over all paths and set CFn = ZL(SnL), and τn = L.

3. Apply backward induction for l = L − 1 to 1, • set J = 0, and for n = 1 to N0,

– if(Snl is in-the-money) then

∗ set J = J + 1 and π(J) = n

∗ fill the regressor matrix AJ m= φm(Snl), m = 1, ..., M

∗ fill the vector with regressands bJ= e−r(τn−l)∆tCFn

• perform regression to calculate the coefficients ˆβml

• for j = 1 to J do – ˆC =PM m=1βˆmlAjm – if (Z(Sπ(j)l) ≥ ˆC) ∗ set CFπ(j)= Zl(Sπ(j)l) and τπ(J )= l 4. Calculate ˆC0=_N1 P N n=1e−rτn ∆t_CF n

5. Calculate the value of the option by ˆV0= max(Z0, ˆC0)

Note that the last step of this algorithm implies that the option can be exercised at time 0, i.e. the option can be bought and exercised at the same time step. The number of exercise dates is determined in advance in the case of Bermudan options and in my numerical investigations I did not allow exercising at time 0, instead the option was evaluated as ˆV0= ˆC0. As can be seen in the

algorithm above we can easily switch between robust regression and least square regression once the algorithm has been implemented. Just change the regression part where the coefficients ˆβml

are calculated.

If we like to use another set of {S_nl2}n=1,...,N2,l=1,...,L to evaluate the option and hence separate the

determination of coefficient part and the evaluation part we can easily do so. Follow the steps 1-3 in the algorithm above to determine the regression coefficients in each time step but replace step 4 and 5 with

(24)

4. Generate a new set of independent paths {S2

nl}n=1,...,N1,l=1,...,L, according to the multivariate

geometric Brownian motion. 5. for n = 1 to N1 do • set l = 1 and ˆC =PM m=1βˆmlφ(S2nl) • while (Z(S2 nl) ≤ 0 or Z(S 2 nl) < ˆC and l 6= L) do – update l=l+1 and ˆC =PM m=1βˆmlφ(S 2 nl) • set CF∗ n = Z(S2nl) and τn∗= l 6. Calculate ˆC0=_N1 PN_n=1e−rτ ∗ n∆tCF∗ n

7. Calculate the value of the option by ˆV0= max(Z0, ˆC0)

3.1.2 Duality approach

In the following subsection I will go through the Andersen and Broadie algorithm that I use to create a lower and upper estimate of the option price.

1. Run the LSM / RRM method with a set of {S0nl}n=1,...N0,l=1,...,L paths and determine the

coefficients that define the stopping criterion τ at each time step such as given by τl =

inf{k ≥ l|Zl≥ ˆCl}.

2. Estimate the lower bound L0 by simulating a new set of paths {S1nl}n=1,...,N1,l=1,...,L and

stopping according to the τ1

ˆ L0= 1 N1 N1 X n=1 e−rτ1n∆t_Zn τn 1. (58)

A valid (1 − α) confidence interval of L0can be created by

ˆ L0± z1−α/2 s ˆ σ2 L N1 (59)

with ˆσL being the estimated standard deviation of ˆL0 and z1−α/2 denotes the (1 − α/2)

quantile of a standard normal distribution. 3. The upper bound U0is given as,

U0= E0

[

max l=1,...,L(e −rl∆t_Z l− Ml)

]

+ M0=: ∆0+ M0 (60) with M0 = E0[e−rτ1∆tZτ1], Ml = Ml−1+ El[e −rτl∆t_Z τl] − El−1[e −rτl∆t_Z τl] =: Ml−1+ δl, l = 1, ..., L, and El[e−rτ1∆tZτl] = ( e−rl∆tZl, if Zl≥ ˆCl El[e−rτl+1∆tZτl+1], if Zl< ˆCl . (61)

To estimate the upper bound U0 we simulate another set of N2 paths {S2nl}n=1,...,N2,l=0,...,L

to approximate ∆0 in (60) by ˆ ∆0:= 1 N2 N2 X n=1 ∆n (62)

(25)

where ∆n= max l=1,...,L(e −rl∆t_Zn l − M n l ), M n l = M n l−1+ δ n l and evaluate δn l by δ_ln= ( e−rl∆tZn l , if Zln≥ ˆCl(S2nl) Cn l, if Z n l < ˆCl(S2nl) − Cn l−1

where C_ln are the result of running a simulation in the simulation, i.e. creating N3new paths

{S3

nk}n=1,...,N3,k=l,...,L stating at S

2

nl and stopping according to τl+1,

C_ln= 1 N3 N3 X m=1 e−rτl+1m ∆t_Zm τm l+1. (63)

4. From the results of the procedure above an upper bound U0can be created

ˆ

U0= ˆ∆0+ ˆL0

and a valid 100(1 − α)% confidence interval for ˆU0is given by

ˆ U0± z1−α/2 s ˆ σ2 L N1 +σˆ 2 ∆ N2 (64)

where ˆσ∆is the estimated standard deviation of ˆ∆0.

5. A valid 100(1 − α)% confidence interval for the true option value can then be determined by " ˆ L0− z1−α/2 ˆ σL √ N1 , ˆU0+ z1−α/2 s ˆ σ2 L N1 +σˆ 2 ∆ N2 # . (65)

4 Results

In this section I will present some numerical results. To start with I concentrate on investigating differences between the robust regression and the least square regression techniques, this is subsec-tion 4.1. The subsecsubsec-tions 4.2 and 4.3 are about speeding up the convergence rates via Quasi Monte Carlo and importance sampling. In subsection 4.4 I combine the methods Quasi Monte Carlo and importance sampling with the RRM method and compare it to the LSM method.

4.1 LSM vs RRM

To investigate how the choice of basis functions influence the result I created Table 4 where the price of a max call option on 2 assets has been calculated using both least square regression (LSR) and robust regression (RR) with different basis. The payoff of this option is Z(s1, s2) =

max(0, max(s1, s2) − K) and the value of these calculations can be compared with the result from

the binomial tree, 8.0735. From the results in Table 4 it is clear that the choice of basis affects the results and if one compares with the value from the binomial tree it seems as if it is preferable to choose a basis that includes the payoff, Z(s1, s2) and interaction terms s1s2.

(26)

Table 4: Option value of Bermudan max call option on 2 stocks for LSR, RRHuber and RRJ onen

method with different basis.

Basis LSR RRHuber RRJ onen

s3_i, s2_i, si, 1 8.00 7.96 7.95 s1s2, s3i, s2i, si, 1 8.02 8.02 8.01 max(s1, s2), s1, s2, si3, s2i, si, 1 8.05 8.06 8.08 s1s22, s21s2, s1s2, s3i, s2i, si, 1 8.04 8.02 8.02 Z(s1, s2), s1s22, s21s2, s1s2, s3i, s2i, si, 1 8.03 8.05 8.05 Z(s1, s2), s1s2, s2i, si, 1 8.05 8.06 8.06

Notes. Common option parameters are T=3, exercise opportunities = 9, K=100, Si

0 = 90,i = 1, 2. r=0.05, δd = 0.1 d = 1, 2, σd = 0.2 d = 1, 2,

ρ1,2 = 0. Results are created using the average of 10 runs and 100000

replications in each run with α = 0.92, β = 0.993. The result from the binomial tree is calculated with 360 time steps an is 8.0735.

The values in Table 4 have been calculated with the parameters α = 0.92 and β = 0.993 for determining transition points. Now, using the base {Z(s1, s2), s1s2, s2i, si, 1} from Table 4 I instead

examine the effect of different transition points on the same option. The results can be seen in Figure 7, the first three points from the left are the prices using Huber’s loss function with different values of α, the following 9 points are the prices using Jonen’s loss function with different α and β. The two points to the right are the prices calculated using least square regression (LSR). The line corresponds to the value of the binomial tree from 360 time steps. The transition points clearly affect the results and remembering that these estimations actually are low estimates of the option price it is a good sign that the values from robust regression are higher than the least square regression values.

Figure 7: Price of max call option on 2 assets with different transition points in the robust regression method. S0= 90 for both stocks. Basis according to Entry 6 in Table 4. The results are based on

the average of 10 runs and 100000 replications per run. For other option parameters see the notes in Table 4.

If we for a moment try and forget about the Andersen and Broadie’s approach and try to investigate differences between the RRM and LSM methods without the duality approach we will see that it is quite hard to find any differences. In Table 5 I have priced a max call option on 2,3 and 5

(27)

underlying assets using both RRM with Huber’s loss function and LSM. These values can then be compared to the value from the binomial tree (in 2 and 3 asset case). It seems that we get slightly smaller confidence intervals and a little higher estimates using robust regression. I use different basis depending on the number of assets and these are according to

{Z(s1, s2), s1s2, s21, s 2 2, s1, s2, 1}, (66) {1, s1, s2, s3, s21, s 2 2, s 2 3, s 3 1, s 3 2, s1s2, s1s3, s2s3, s21s2, s1s22}, (67) {1, s1, s2, s3, s4, s5, s21, s 2 2, s 2 3, s 2 4, s 2 5, s 3 1, s 3 2, s1s2, s1s3, s2s3, s21s2, s1s22}, (68)

for the 2, 3 resp 5 asset case where the stock prices has been ordered such that s1> s2> · · · > sd.

Table 5: Calculated 95% confidence interval using LSM and RRM method for Bermudan max call option on 2, 3 and 5 assets.

Nr of assets S0 Method 95% CI Size CI Est. Price Binomial tree

2 90 LSM [7.9658, 8.1130] 0.154 8.0420 8.0735 RRM [7.9828, 8.1334] 0.151 8.0581 100 LSM [13.745, 13.937] 0.192 13.841 13.901 RRM [13.781, 13.970] 0.189 13.875 110 LSM [21.181, 21.404] 0.223 21.292 21.354 RRM [21.205, 21.427] 0.222 21.316 3 90 LSM [11.167, 11.341] 0.174 11.254 11.322 RRM [11.177, 11.348] 0.171 11.262 100 LSM [18.566, 18.782] 0.216 18.674 18.692 RRM [18.575, 18.789] 0.213 18.682 110 LSM [27.443, 27.691] 0.249 27.567 27.620 RRM [27.426, 27.675] 0.249 27.551 5 90 LSM [16.531, 16.734] 0.203 16.632 -RRM [16.522, 16.723] 0.201 16.623 100 LSM [26.025, 26.267] 0.242 26.146 -RRM [26.037, 26.278] 0.241 26.158 110 LSM [36.574, 36.847] 0.273 36.711 -RRM [36.617, 36.891] 0.274 36.754

Notes. Common option parameters are T=3, exercise opportunities = 9, K=100, r=0.05, δd= 0.1 d = 1, ...5, σd= 0.2 d = 1, ..., 5, ρde= 0 ∀d 6= e. For the Binomial tree 360 time steps

are used for 2 asset, and 90 time steps for 3 assets. The number of paths in MC calculations are 100000 and repeated 10 times. Basis for 2 assets are according to (66), for 3 assets (67), and for 5 assets (68). The transition points α = 0.90 with Huber loss function.

To compare the variances between the methods I have also plotted the standard error for the 2 assets case and S0= 100 for both stocks, see Figure 8. From the results in Table 5 and Figure 8

(28)

Figure 8: Standard error calculated with robust regression and least square regression for Bermudan max call option on 2 stocks. S0= 100 for all stocks, see Table 5 for other option parameters.

To try and find differences between the RRM and LSM technique before adding the technique of Andersen and Broadie I created the histograms in Figure 9 and the density plot in Figure 10. In these figures I have calculated the price of a max call option on 3 and 5 assets 2500 resp 1500 times using 100000 paths per run. In the density plot I have also added the value calculated with the binomial tree for the 3 asset case (18.69). From the density plot one can see that the distributions differ a little bit and again we see that the values calculated with the RRM technique are higher than the LSM. From these figures it seems as if the data are normally distributed and this can not be rejected with statistical tests. Therefore I also do a two-sided student t-test to test if the mean from the two populations are equal, and this test shows that there is a significant difference between the mean of the 2 distributions. However if I also do a test to check for equal variances this test does not fail and hence strengthen the belief that the variances between the populations does not differ. An interesting note from the histograms is that there appears to be more ”extreme” values from the LSM method, i.e. values that lies below 18.55 in 3 asset case and below 25.95 in the 5 asset case.

(29)

(a)

(b)

Figure 9: Histogram of the price of a max call option on 3 and 5 assets. See table 5 for parameter settings.

(30)

(a)

(b)

Figure 10: Density plot of the price of a max call option on 3 and 5 assets. See table 5 for parameter settings.

4.1.1 Duality approach

If we add the method of Andersen and Broadie and thus create both an upper and lower bound for the option price we will see that it is easier to find differences between the RRM and LSM methods. In Table 6 I have priced an arithmetic call option on 5 assets, the payoff for this option is given by

Z(s1, s2, s3, s4, s5) = max(0,

(s1+ s2+ s3+ s4+ s5)

5 − K).

The basis that I use is according to

(31)

and to make a fair comparison of the methods I set a seed in the pseudo number generator according to (57). In the table I calculate both a lower and upper bound of the option price (standard error in the parenthesis) and a 95% confidence interval according to (65). The ∆ ratio is simply the difference between the high and low estimator from the LSM approach divided with the difference between the high and low estimator from the RRM approach. The tighter between the lower and upper bound and the higher the lower bound the more accurate is the approximated early exercise strategy, therefore this ∆ ratio can be used to compare the methods. As can be seen in Table 6 we get higher lower bounds and tighter upper bounds with the RRM method using both Huber’s loss function and Jonen’s loss function compared to the LSM method. The confidence intervals are smaller and we can see a slight reduction in variance using robust regression. The ∆ ratios are between 2.5 and 4.

Table 6: Low estimator, High estimator and a calculated 95% confidence interval for a Bermudan arithmetic average call option on 5 asset, using both LSM and RRM method.

S0 Method Low High 95% CI Size CI ∆ Ratio

90 LSM 1.534 (0.0048) 1.554 (0.0057) [1.525, 1.565] 0.040 RRM, Huber 1.545 (0.0046) 1.552 (0.0048) [1.536, 1.561] 0.025 3.1 RRM, Jonen 1.546 (0.0045) 1.552 (0.0047) [1.537, 1.561] 0.024 3.2 100 LSM 3.955 (0.0068) 3.995 (0.0077) [3.942, 4.010] 0.068 RRM, Huber 3.979 (0.0064) 3.995 (0.0067) [3.966, 4.008] 0.042 2.5 RRM, Jonen 3.984 (0.0063) 3.994 (0.0064) [3.971, 4.006] 0.035 4.0 110 LSM 9.313 (0.0078) 9.371 (0.0091) [9.298, 9.389] 0.091 RRM, Huber 9.340 (0.0075) 9.364 (0.0080) [9.326, 9.379] 0.054 2.5 RRM, Jonen 9.343 (0.0074) 9.359 (0.0077) [9.329, 9.374] 0.046 3.6

Notes. Common option parameters are T=3, exercise opportunities = 9, K=100, r=0.05, δd= 0.1

d = 1, ...5, σd= 0.08 + 0.08 · d d = 1, ..., 5, ρde= 0 ∀d 6= e. The number of paths in MC calculations

are N0= 400000, N1= 1000000, N2= 1500, N3= 8000. Basis are according to (69). The transition

points are α = 0.9 for Huber and α = 0.87 and β = 0.993 with Jonen’s loss function.

In Table 7 I again use the seed in (57) but change the option to a max call option on 2 assets and use the base according to (66). I calculate low and high estimates, 95% confidence intervals and ∆ ratios. For this option benchmarks values can be calculated using the binomial tree, and these are 8.074, 13.90 and 21.35. It can be seen that the calculated confidence intervals cover these ”true” values and again we see that we get higher lower bounds and tighter upper bounds using the RRM method compared to the LSM method. The ∆ ratios are not as large as in Table 6 but they are between 1.4 and 2.4.

(32)

Table 7: Low estimator, High estimator and a calculated 95% confidence interval for a Bermudan max call option on 2 asset, using both LSM and RRM method.

90 LSM 8.044 (0.012) 8.075 (0.013) [8.020, 8.100] 0.080 RRM, Huber 8.062 (0.012) 8.077 (0.012) [8.039, 8.101] 0.063 2.1 RRM, Jonen 8.064 (0.012) 8.077 (0.012) [8.041, 8.101] 0.060 2.4 100 LSM 13.866 (0.0155) 13.918 (0.0163) [13.836, 13.950] 0.115 RRM, Huber 13.888 (0.0153) 13.917 (0.0157) [13.858, 13.948] 0.0903 1.7 RRM, Jonen 13.895 (0.0152) 13.921 (0.0155) [13.865, 13.952] 0.0861 2 110 LSM 21.289 (0.0180) 21.378 (0.0192) [21.253, 21.415] 0.162 RRM, Huber 21.313 (0.0179) 21.376 (0.0187) [21.278, 21.412] 0.134 1.4 RRM, Jonen 21.322 (0.0178) 21.374 (0.0184) [21.287, 21.410] 0.123 1.7

Notes. Common option parameters are T=3, exercise opportunities = 9, K=100, r=0.05, δd = 0.1

d = 1, 2, σd= 0.2 d = 1, 2, ρde= 0 ∀d 6= e. The number of paths in MC calculations are N0 = 400000,

N1 = 1000000, N2 = 1500, N3 = 8000. Basis are according to (66). The transition points are α = 0.9

for Huber and α = 0.87 and β = 0.993 with Jonen’s loss function.

The values in Table 8 have been created without any seed set in the pseudo number generator. In this table I have priced a max call option on 5 underlying assets using the base in (68). We see a slight reduction in variance and ∆ ratios between 1.2 and 1.6 in this case.

Table 8: Low estimator, High estimator and a calculated 95% confidence interval for a Bermudan max call option on 5 asset, using both LSM and RRM method.

90 LSM 27.398 (0.0329) 27.685 (0.0377) [27.334, 27.758] 0.425 RRM, Huber 27.483 (0.0316) 27.725 (0.0356) [27.422, 27.795] 0.373 1.2 RRM, Jonen 27.431 (0.0311) 27.680 (0.0359) [27.370, 27.750] 0.380 1.2 100 LSM 37.614 (0.0380) 38.131 (0.0479) [37.539, 38.225] 0.686 RRM, Huber 37.740 (0.0360) 38.055 (0.0437) [37.670, 38.141] 0.471 1.6 RRM, Jonen 37.660 (0.0352) 38.008 (0.0413) [37.591, 38.089] 0.499 1.5 110 LSM 49.116 (0.0416) 49.627 (0.0491) [49.034, 49.723] 0.689 RRM, Huber 49.180 (0.0399) 49.621 (0.0497) [49.102, 49.718] 0.617 1.2 RRM, Jonen 49.093 (0.0391) 49.461 (0.0453) [49.017, 49.549] 0.533 1.4

Notes. Common option parameters are T=3, exercise opportunities = 9, K=100, r=0.05, δd = 0.1

d = 1, ..., 5, σd= 0.08 + 0.08 · d d = 1, ..., 5, ρde= 0 ∀d 6= e. The number of paths in MC calculations are

N0= 400000, N1= 1000000, N2= 1500, N3= 8000. Basis are according to (68). The transition points

are α = 0.9 for Huber and α = 0.87 and β = 0.993 with Jonen’s loss function.

4.2 Quasi MC

To examine the effect of Quasi Monte Carlo I price a max call option on 2 asset with the same option parameters as in table 5 and valuate the option with the same paths that I use to determine the exercise strategy. As can be seen in Figure 11 the values calculated with Sobol numbers converge much faster and stays stable compared to when pseudo random numbers are used. It looks as if

(33)

the values calculated using Sobol numbers are stable already after 50000 paths. The value from the Binomial tree has also been plotted for comparison.

Figure 11: Bermudan max call option on 2 stocks, S0= 100 for both stocks valuated using robust

regression with both pseudo random numbers and Sobol numbers. The value calculated with the binomial tree is 13.90 for comparison. see Table 5 for other parameters.

The same striking result can be seen in Figure 12 and 13 where the lower bound for a max call option on 2 resp 3 assets has been determined using both Sobol numbers and pseudo random numbers but the same exercise strategy.

(34)

Figure 12: Convergence of Low estimator using least square regression with pseudo random num-bers and Sobol numnum-bers. For Bermudan max call option on 2 stocks. S0= 90 for both stocks, see

Table 7 for other option parameters. N0= 500000.

Figure 13: Convergence of Low estimator using least square regression with both pseudo random numbers and Sobol numbers. For Bermudan max call option on 3 stocks. S0= 100 for both stocks,

(35)

4.3 Importance Sampling

In Table 9 I have calculated the lower bound of an arithmetic average call option on 3 assets with expiry date of both 1 and 3 years using both importance sampling and no importance sampling. The option is an out-of-money (OTM) option with S0set to 90 and strike price K set to 100. In the

case of importance sampling I set the drift of the assets so that the option is in-the-money (ITM) after 4 time steps, thereafter the paths will evolve according to the ordinary geometric Brownian motion. To calculate the results I use the base

{1, s1, s2, s3, s21, s 2 2, s

2

3, (s1+ s2+ s3)/3}, (70)

and use LSM method to determine the exercise strategy. The seed is set according to (57). The results show that we can calculate the lower bound with a much smaller standard error (SE) using importance sampling than when no importance sampling is used. The results also indicate that the importance sampling technique gives a more striking result the more unlikely the outcome is. Table 9: Low estimator and standard error for an arithmetic average call option on 3 assets with maturity date 1 and 3 years.

Method T=1 T=3

Low estimator SE Low estimator SE

Standard MC 0.47377 0.0019 1.0964 0.0034

Import. Samp 0.47302 0.00061 1.0961 0.0013

Binomial tree 0.47622 1.0910

Notes. Common option parameters are, exercise opportunities = 9, K=100, S0 = 90 for all assets, r=0.05, δd = 0.1 d = 1, 2, 3, σd = 0.2 d = 1, 2, 3,

ρde= 0 ∀d 6= e. For the Binomial tree 90 time steps are used. The number

of paths in MC calculations are N0 = 500000 and N1 = 1000000. Seed is

set according to (57) and basis according to (70). Early exercise regions are calculated with the LSM. The drift when using importance sampling is set so that at-the-money is reached after 4 time steps.

I also create a convergence plot of the low estimator for the same option and using the same parameters as in the example above but without any seed set (the exercise strategy is the same in both cases). One can see that the low estimator seems to be too high compared to the true value created by the binomial tree. A more interesting result can be seen in Figure 15 where the standard error has been plotted. There is clearly a big difference between the case when importance sampling is used compared to when it is not.

(36)

Figure 14: Convergence of Low estimator using least square regression with and without importance sampling. For Bermudan arithmetic average option on 3 stocks. S0= 90 for both stocks and T = 3.

See Table 9 for other option parameters. N0= 500000, no seed set but uses the same coefficients

in both cases.

Figure 15: Standard error of Low estimator using least square regression with and without impor-tance sampling. For Bermudan arithmetic average option on 3 stocks. S0 = 90 for both stocks

and T = 3. See Table 9 for other option parameters. N0= 500000, no seed set but uses the same

(37)

4.4 Combinations

As a final result I create a convergence plot of both the low- and high estimation of the price of an arithmetic average call option on 2 assets using the base

{1, s1, s2, s21, s22, s31, s32}. (71)

I do this using both the least square technique and the robust regression technique to determine the exercise strategy. Once the exercise strategy has been determined I calculate the lower and upper bound using the exercise strategy determined by least squares without any quasi numbers or importance sampling. I also calculate lower and upper bound using the exercise strategy from robust regression but then I add quasi numbers and importance sampling. Importance sampling is only used to calculate the lower bound and the drift is set so that in-the-money regions is reached after 4 time steps. The results can be seen in Figure 16 and it is clear that the results using robust regression give a much smaller interval than the least square regression, this can also be seen in Figure 17 where I plot the difference between the upper and lower bound for both cases. It is also clear that the the values calculated using Sobol numbers and importance sampling converges much faster than when pseudo random numbers are used.

Figure 16: Upper and lower bounds calculated for an arithmetic average call option on 2 assets using both LSM with pseudo random numbers and RRM combined with Sobol numbers and importance sampling to calculate the lower bound and Sobol numbers to calculate upper bound. The calculations with robust regression are made with Jonens loss function using α = 0.87 and β = 0.993. S0 = 90 for both stocks and T = 3. See Table 9 for other option parameters.

N0= 300000, N1varies between 10000 and 500000, N2 between 100 and 500, N3between 200 and

Regression-Based Monte Carlo For Pricing High-Dimensional American-Style Options

Contents

1

Introduction

1.1

Options

1.2

Pricing Options

2

Theory

2.1

Monte Carlo

2.2

Dynamics of the stock price

2.3

Monte Carlo for pricing financial derivatives

2.4

Robust Regression

2.5

Duality approach

τ

τ

τ

τ

(

[

]

)

[

]

τ

[

]

2.6

Variance reduction

(

)

[

]

2.7

Quasi Monte Carlo

(

)

3

Method

3.1

Algorithms

[

]

4

Results

4.1

LSM vs RRM

4.2

Quasi MC

4.3

Importance Sampling

4.4

Combinations