• No results found

2. Dynamic programming in finite time

N/A
N/A
Protected

Academic year: 2021

Share "2. Dynamic programming in finite time"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

MATEMATISKA INSTITUTIONEN, STOCKHOLMS UNIVERSITET

Discrete-time dynamic programming applied to economic theory

av

Markus Pettersson

2019 - No K25

MATEMATISKA INSTITUTIONEN, STOCKHOLMS UNIVERSITET, 106 91 STOCKHOLM

(2)
(3)

Markus Pettersson

Självständigt arbete i matematik 15 högskolepoäng, grundnivå Handledare: Yishao Zhou

2019

(4)
(5)

1 Introduction 1

2 Dynamic programming in finite time 3

2.1 Problem formulation . . . 3

2.2 Bellman’s principle of optimality . . . 4

2.3 The dynamic programming algorithm . . . 5

3 Extension to infinite time 9 3.1 Problem formulation . . . 9

3.2 The Bellman equation and optimality . . . 10

3.3 Solving the Bellman equation . . . 13

4 Applications in economics 16 4.1 The permanent income hypothesis . . . 16

4.2 Optimal investment and Tobin’s q . . . 18

4.3 Job search and unemployment. . . 20

4.4 Optimal economic growth . . . 21

5 Concluding summary 25

References 26

Appendix A Proof of interchangeability between limit and expectation 27 Appendix B Derivation of the optimal-growth value function 29

i

(6)

1. Introduction

An important area within optimisation concerns decision making in the presence of time. In such dynamic contexts where decisions are made in stages, the outcome of a decision made today will typically affect not only the optimal decision but also the space of available decisions in subsequent stages. Therefore, decisions cannot be viewed in isolation since the desire for an optimal outcome today must be balanced against the desire for optimal outcomes in future stages. As an illustration, consider the following prototypical problem:

Example 1.1 (The stagecoach problem). A stagecoach is travelling from A to J in Figure 1.1. An arrow from one node to another indicates a possible route to travel and its label indicates the length of that route. What is the shortest path from A to J? / InExample 1.1, choosing the shortest route at each stage yields the overall route A→ B → F → I → J of total length 13. This is not the shortest overall route however; for instance, the route A→ D → F → I → J of length 11 is shorter.

Dynamic programming presents a way of solving problems of this type. Introduced by Bellman (1952, 1953), dynamic programming simplifies the problem at hand by recursively breaking it down into a collection of simpler subproblems where each of those subproblems are solved once. The purpose of this thesis is to address this topic.

Specifically, it aims to describe the underlying theory of dynamic programming and to formulate the type of problems it can solve.

Since its introduction in the 1950s, dynamic programming has become a standard tool in a variety of applied fields that rely heavily on optimisation. One such field is economics, where solving theoretical models in areas ranging from game theory to labour economics to macroeconomics typically boils down to some dynamic optimisation problem. Indeed, in their standard reference on modern macroeconomics, Ljungqvist and Sargent (2012, ch. 1) discuss “the imperialism of recursive methods” and point out that

Dynamic programming is now recognized as a powerful method for studying private agents’ decisions and also the decisions of a government that wants to design an optimal policy in the face of constraints imposed on it by private agents’ best responses to that government policy.

A second aim of this paper is therefore to illustrate and analyse the application of dynamic programming within economic theory. We do so by using our theory to derive two cornerstone results in economics – the permanent income hypothesis and Tobin’s q – as well as to practically solve two economic models – one labour economics search model and one macroeconomic growth model.

1

(7)

A

B

C

D

E

F

G

H

I

J

2

4

3

7 4 6

3 2 4

4 1

5

1 4

6 3

3 3

3

4

Figure 1.1. The stagecoach problem

In short, dynamic programming is used to solve models of dynamic optimisation, and such models in general require several important considerations. For instance:

(i) Should time be considered continuous or discrete?

(ii) Does the problem have a fixed end stage or does it proceed infinitely far into the future?

(iii) Is the problem deterministic or stochastic? That is, are there random variables that are out of the decision maker’s control that should be considered?

(iv) In case of a stochastic problem, is the decision maker fully aware of the state he or she is in (or the location he or she is at inExample 1.1) at all times? In other words, is there perfect information with respect to the state or is some estimation of the state required?

In what follows, we will consider discrete-time problems with perfect state information only. We also include stochastic elements but keep it necessarily simple in order to avoid the use of measure theory and Markov chains, which go beyond the scope of this paper. Under these premises, we first consider a rather general theory for problems with finite horizon and then extend this theory into a subclass of problems with infinite time horizon. In particular, we consider stationary and discounted infinite-time problems with a bounded goal function. These choices reflect the fact that discrete-time and discounted models are by far the most common choice considered in economics.

The rest of the paper proceeds as follows. The theory of dynamic programming with finite time horizon is presented in Section 2. In Section 3 we extend this framework to problems with infinite time horizon. The applications of dynamic programming in economics are covered inSection 4 and Section 5 concludes.

(8)

2. Dynamic programming in finite time

In this section, we cover the basic theory behind dynamic programming in finite time. In particular, we formulate the type of problems for which we can apply dynamic program- ming to and present the algorithm used to solve them. In what follows below, Defin- itions 2.1 and 2.2 are generalised versions of deterministic counterparts in Voorneveld (2016, ch. 25), while Theorem 2.2 with corresponding proof is from Bertsekas (2005, ch. 1). Lemma 2.1 is formalised and proved independently.

2.1 Problem formulation

In general, suppose we wish to optimise the additively separable objective function XT

t=0

gt xt, ut, zt ,

where xt evolves according to the system xt+1 = ft xt, ut, zt

, t = 0, . . . , T − 1.

Here, t indexes time and T is the horizon of the system. At each time t there is

(i) a state vector xt that summarises where the system is at time t. It lies in a set X called the state space. The initial state x0 is assumed to be given;

(ii) a control vector ut which is the choice vector. It lies in a set U (xt) called the control space, which in turn depends on the realised state;

(iii) a vector of random variables zt drawn from the set Z (which may be empty).

In order to avoid the use of measure theory and Markov chains, we make the following simplifying assumption throughout the thesis:

Assumption 2.1. The vector of random variables zt is drawn from a finite set Z = {z1, . . . , zN} and is independent across time, states, and controls. That is, for each period t and each feasible history xt={xt, . . . , x0}, ut={ut, . . . , u0}, zt−1 ={zt−1, . . . , z0},

Pr

zt= zi

xt, ut, zt−1

= Pr

zt= zi

≡ pi with

XN i=1

pi= 1,

where pi denotes the constant unconditional probability of observing zi. / 3

(9)

The presence of a stochastic element ztmakes the optimisation problem stochastic as we do not know ex ante the future realisations of zt. This implies that we in fact optimise the expectation of the objective function with respect to the probability distribution of zt, rather than optimising the objective function itself. We can therefore formally define this optimisation problem as follows:

Definition 2.1 (The finite-time dynamic programming problem). A discrete-time dy- namic programming problem with finite horizon is of the form

sup

{ut}Tt=0 E

" T X

t=0

gt xt, ut, zt#

subject to ut∈ U(xt), t = 0, . . . , T, zt∈ Z, t = 0, . . . , T, xt+1= ft xt, ut, zt

, t = 0, . . . , T − 1, x0 given,

whereAssumption 2.1is satisfied andE is the expectations operator with respect to zt, defined asE[zt] =PN

i=1pizi. /

Note that we could just as well have statedDefinition 2.1as a minimisation problem by defining the function ht(xt, ut, zt) =−gt(xt, ut, zt). We define the feasible options of the dynamic programming problem as follows:

Definition 2.2. Consider the dynamic programming problem inDefinition 2.1.

• For x0 given, a pair (x, u) ={(x0, u0), . . . , (xT, uT)} such that ut∈ U(xt) for all t and xt+1 = ft(xt, ut, zt) for all t + 1 > 0 is said to be admissible or feasible. An admissible pair is optimal and u ={u0, . . . , uT} is an optimal control if there is no other admissible pair that yields a higher expected value of the objective function.

• A policy is a sequence of functions π = {π0, . . . , πT} where πt maps states xtinto controls ut= πt(xt) in U (xt) for all xt. Thus, a policy π yields an admissible pair (x, u) and such a policy is optimal if there is no other policy with a higher expected

value of the objective function. /

Since a policy yields an admissible pair, choosing a policy is equivalent to choosing controls ut; given ut= πt(xt), for any function g of xt, ut, and zt we have

sup

πt∈Πt

g xt, πt(xt), zt

= sup

ut∈U(xt)

g xt, ut, zt ,

where Πt is the set of all functions πt(xt) such that πt(xt)∈ U(xt) for all xt.

2.2 Bellman’s principle of optimality

Solving the optimisation problem in Definition 2.1relies on the principle of optimality which Bellman (1953) states as follows:

(10)

Dynamic programming in finite time 5 An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

In essence, the optimality principle states that if u ={u0, . . . , uT} is an optimal control for the dynamic programming problem and xsat time s occurs with positive probability under this policy, then the sequence of controls {us, . . . , uT} must be optimal in the subproblem of optimising the remaining objective function

E

" T X

t=s

gt xt, ut, zt# ,

that starts in period s and state xs. We formulate this principle with the lemma below.

Lemma 2.1 (The principle of optimality). If the admissible pair (x, u) solves the dynamic programming problem, then{(xt, ut)}Tt=s solves the subproblem starting in period s with initial state xs.

Proof. We use induction over s. The result holds by assumption for s = 0. Now assume it holds for some s≥ 0. By linearity of the expectations operator, we can write

E

" T X

t=s

gt(xt, ut, zt)

#

= Eh

gs(xs, us, zs)i +E

" T X

t=s+1

gt(xt, ut, zt)

# .

Since{xt, ut}Tt=sis optimal, we have by definition that for any feasible pair{ˆxt, ˆut}Tt=s+1,

Eh

gs(xs, us, zs)i +E

" T X

t=s+1

gt(xt, ut, zt)

#

≥ Eh

gs(xs, us, zs)i +E

" T X

t=s+1

gt(ˆxt, ˆut, zt)

# .

Simplifying this expression we get

E

" T X

t=s+1

gt(xt, ut, zt)

#

≥ E

" T X

t=s+1

gt(ˆxt, ˆut, zt)

# ,

so{(xt, ut)}Tt=s+1 is also optimal for the subproblem starting in period s + 1 with initial state xs+1.

This result can be illustrated with the following simplistic travel analogy: if the fast- est route from Stockholm to Copenhagen goes through Malm¨o, then the Malm¨o-to- Copenhagen part of that route is also the fastest route from Malm¨o to Copenhagen.

2.3 The dynamic programming algorithm

The principle of optimality suggests that we can solve the dynamic programming problem as follows. First, find the optimal control for the subproblem involving the last period

(11)

only. By the principle of optimality we know that the optimal last-period control uT must also be optimal in period T in the subproblem involving the last two periods. Thus, we can substitute backwards and use this control to solve for the optimal controls of the subproblem involving the last two periods. Continuing backwards in this manner enables us to sequentially solve for the optimal controls of the full problem. More formally, for a given time s∈ {0, . . . , T } and state xs, define the optimal value function as

Js(xs) = sup

{ut}Tt=s

E

" T X

t=s

gt xt, ut, zt#

, (2.1)

where (xt, ut, zt)∈ X × U(xt)× Z for all t ∈ {s, . . . , T }. For the final-period problem, where s = T , we need not worry about future consequences from the choice of control.

Thus, JT(xT) must necessarily satisfy JT(xT) = sup

uT∈U(xT)Eh

gT xT, uT, zTi .

Now, in the subproblem involving the last two periods, we know that choosing a control uT−1 yields an instantaneous payoff gT−1(xT−1, uT−1, zT−1) and a next-period state xT = fT−1(xT−1, uT−1, zT−1). Since we know the optimal value of the subproblem involving only the final period, we therefore know that

JT xT

= JT fT−1(xT−1, uT−1, zT−1) .

Clearly, the best thing to do is to optimise the sum of these two expressions:

JT−1(xT−1) = sup

uT−1 E



gT−1(xT−1, uT−1, zT−1) + JT fT−1(xT−1, uT−1, zT−1) .

Continuing backwards we then get for each time s∈ {0, . . . , T − 1} and state xs that Js(xs) = sup

us∈U(xs)E



gs(xs, us, zs) + Js+1 fs(xs, us, zs) .

This is exactly the dynamic programming algorithm which we state and prove below.

Theorem 2.2 (The dynamic programming algorithm). The optimal value function satisfies

JT(xT) = sup

uT∈U(xT)Eh

gT xT, uT, zTi

, (2.2)

Js(xs) = sup

us∈U(xs)E



gs(xs, us, zs) + Js+1 fs(xs, us, zs)

, s = 0, . . . , T − 1, (2.3)

where the expectation is taken with respect to the probability distribution of zs. For x0 given, it follows that J0(x0) is the optimal value of the dynamic programming problem.

Moreover, if us = πs(xs) solves the right-hand sides of Equations (2.2)and (2.3)for each xs and s, the policy π ={π0, . . . , πT} is optimal and u ={u0, . . . , uT} is an optimal control.

(12)

Dynamic programming in finite time 7 Proof. We use induction over s to show that for all s ∈ {0, . . . , T }, the optimal value function Js(xs) defined by Equation (2.1) is identical to Js(xs) given by the dynamic programming algorithm inEquations (2.2)and(2.3). For the base case, let s = T . Then byEquation (2.1), we have for all xT that

JT(xT) = sup

{ut}Tt=TE

" T X

t=T

gt xt, ut, zt#

= sup

uT

Eh

gT xT, uT, zTi ,

which is indeed identical to Equation (2.2). So the base case holds. For the induction step, suppose Equations (2.2)and (2.3) hold for s + 1, . . . , T for some s < T . Then for all xs,

Js(xs) = sup

{ut}Tt=s

E

"

gt xs, us, zs +

XT t=s+1

gt xt, ut, zt#

(by (2.1))

= sup

us

E

"

gs xs, us, zs

+ sup

{ut}Tt=s+1

E ( T

X

t=s+1

gt xt, ut, zt)#

(byLemma 2.1)

= sup

us

Eh

gs xs, us, zs

+ Js+1 fs(xs, us, zs)i

(by (2.1))

= sup

us Eh

gs xs, us, zs

+ Js+1 fs(xs, us, zs)i

(induction hypothesis)

= Js(xs) (by (2.3)).

It follows that the induction step also holds and this proves the theorem.

We finish this section by demonstrating the dynamic programming algorithm by applying it on the stagecoach problem inExample 1.1.

Example 1.1 (Cont.). In Figure 1.1, we wish to find the shortest path from A to J.

This is equivalent to maximising the additive inverses of the path lengths, so we can apply the dynamic programming algorithm. Moreover, there is no stochastic element involved soZ = ∅ and we do not have to consider expected values. In stage t, the state is the current location and the control is the route chosen. Then we immediately have that J3(H) = 3 and J3(I) = 4 since there is only one feasible control in these states;

u3 = J. Working backwards we see that J2(E) = min

1 + J3(H), 4 + J3(I)

= min{4, 8} = 4, J2(F ) = min

6 + J3(H), 3 + J3(I)

= min{9, 7} = 7, J2(G) = min

3 + J3(H), 3 + J3(I)

= min{6, 7} = 6,

with corresponding optimal controls u2 = H, u2 = I, and u2 = H, respectively. Going back one more period we have

J1(B) = min

7 + J2(E), 4 + J2(F ), 6 + J2(G)

= min{11, 11, 12} = 11, J1(C) = min

3 + J2(E), 2 + J2(F ), 4 + J2(G)

= min{7, 9, 10} = 7, J1(D) = min

4 + J2(E), 1 + J2(F ), 5 + J2(G)

= min{8, 8, 11} = 8,

(13)

A

B

C

D

E

F

G

H

I

J

2

4

3

7 4 6

3 2 4

4 1

5

1 4

6 3

3 3

3

4

Figure 2.1. Solution to the stagecoach problem

with corresponding optimal controls u1 ∈ {E, F }, u1 = E, and u1 ∈ {E, F }, respectively.

It follows that the length of the shortest path is J0(A) = min

2 + J1(B), 4 + J1(C), 3 + J1(D)

= min{13, 11, 11} = 11 and the routes that achieve this are

A→ C → E → H → J, A→ D → E → H → J, A→ D → F → I → J.

These routes are illustrated inFigure 2.1. /

(14)

3. Extension to infinite time

We now generalise the results of the previous section to allow for an infinite time horizon.

That is, we let T → ∞. Doing so introduces a number of complications. For instance, the goal function now becomes a series, so we need to ensure that it is well-defined. Moreover, the dynamic programming algorithm inSection 2 works backward from some finite end period. In infinite time, no such period exists. Below we show how to account for these issues for a subclass of infinite-time problems, that of discounted, stationary problems with bounded goal function, and outline how to solve them. Unless stated otherwise, the theorems in this section are from Bertsekas (2007, ch. 1) while the proofs and definitions are generalised versions of deterministic counterparts in Voorneveld (2016, ch. 27–28).

3.1 Problem formulation

Bertsekas (2007, p. 2) identifies four principal classes of infinite-time dynamic programs:

(i) stochastic shortest path problems; (ii) stationary discounted problems with bounded objective functions; (iii) problems with unbounded objective functions; and (iv) prob- lems that optimise the average of the per-stage objective functions. Since the primary concern in this thesis is to investigate dynamic programming within the context of eco- nomic theory, we restrict our attention here to the second class of problems, that of bounded discounted problems, as it is by far the most common type in economics.

Definition 3.1 (The discounted infinite-time dynamic programming problem). A sta- tionary and discounted discrete-time dynamic programming problem with infinite hori- zon and discount factor β∈ (0, 1) is of the form

sup

{ut}t=0 E

" X

t=0

βtg xt, ut, zt#

subject to ut∈ U(xt), t = 0, 1, 2, . . . , zt∈ Z, t = 0, 1, 2, . . . , xt+1= f xt, ut, zt

, t = 0, 1, 2, . . . , x0 given,

whereAssumption 2.1is satisfied and E is defined as in Definition 2.1. The problem is

stationary since neither g nor f depend on time. /

Admissible pairs, controls, and policies are defined analogously to Definition 2.2. To guarantee that the objective function is summable, we assume the following throughout:

9

(15)

Assumption 3.1. For some real scalar M , the function g satisfies |g(x, u, z)| ≤ M for

all (x, u, z)∈ X × U(x) × Z. /

This makes the objective function well-defined; if (x, u) is an admissible pair, then

E

" T X

t=0

βt g xt, ut, zt

#

≤ XT t=0

βtM = 1− βT +1

1− β M → M

1− β as T → ∞, making the left-hand side summable. Note also that the objective function in Defini- tion 3.1in fact should be limT→∞Eh PT

t=0βtg xt, ut, zti

, which is not in general equal to our formulation. However, Bertsekas (2007, pp. 3–4) points out that under Assump- tions 2.1 and 3.1, these are indeed equal and thus allow for the formulation above. We prove this inAppendix A.

3.2 The Bellman equation and optimality

Solving the infinite-time dynamic programming problem still rests on the principle of optimality. In our case, this principle is completely analogous to the finite-horizon case;

just let gt(·) = βtg(·) and T → ∞ inLemma 2.1and the corresponding proof to get that if (x, u) is optimal in the infinite-time problem, then {(xt, ut)}t=s is optimal in the infinite-time subproblem starting in period s with initial state xs. Thus, in what follows we refer to Lemma 2.1 also as the infinite-horizon version of the optimality principle.

Now, in order to solve the infinite-time problem, we start off as in the finite-horizon case and define the optimal value function:

Definition 3.2 (The value function). For a given state x ∈ X, we define the optimal value function as

J(x) = sup

{ut}t=0

E

" X

t=0

βtg xt, ut, zt#

, (3.1)

where xt∈ X, ut∈ U(xt), and zt∈ Z for all t. /

This definition does not specify a time period for x, so the value function applies to any subproblem starting in an arbitrary time period s with initial state xs. From this definition it is clear that the optimal value of such subproblem is βsJ(xs) and thus that J(x0) is the optimal value of the full dynamic program. Solving the original problem is therefore equivalent to finding J(x0). This is in general not a straightforward task, but the following theorem provides us with a helpful tool in this regard.

Theorem 3.1 (The Bellman equation). For all x∈ X, the value function satisfies the Bellman equation

J(x) = sup

u∈U(x)Eh

g(x, u, z) + βJ f (x, u, z)i

, (3.2)

where u is the control and z is the realisation of the stochastic element in the time period of state x.

(16)

Extension to infinite time 11 Proof. For all x∈ X,

J(x) = sup

{ut}t=0E

" X

t=0

βtg xt, ut, zt#

(by (3.1))

= sup

u E

"

g(x, u, z) + sup

{ut}t=1E (

X

t=1

βtg(xt, ut, zt) )#

(byLemma 2.1)

= sup

u E

"

g(x, u, z) + β sup

{ut+1}t=0E (

X

t=0

βtg(xt+1, ut+1, zt+1) )#

(relabelling)

= sup

u Eh

g(x, u, z) + βJ f (x, u, z)i

(by (3.1)).

Theorem 3.1 essentially states that the value function J(x) is a fixed point1 to the mapping F : X → X defined by

F J(x) = sup

u∈U(x)Eh

g(x, u, z) + βJ f (x, u, z)i

. (3.3)

Also note the similarity between the Bellman equation (3.2) and the dynamic program- ming algorithm inSection 2. The difference here is that since we do not iterate backwards from some end period, the value function on the right-hand side is typically unknown.

Moreover, the Bellman equation is only a necessary condition for the value function.

There may, however, be other functions that satisfy the Bellman equation. The first of these two concerns can be handled via the following result:

Theorem 3.2. For any bounded function J : X→ R, the value function satisfies J(x) = lim

N→∞FNJ(x) for all x∈ X,

where F : X → X is defined by Equation (3.3) and FN denotes the composition of F with itself N times. By convention, F0J(x)≡ J(x).

Proof (derived independently). We use induction to first show that

FNJ(x0) = sup

{ut}Nt=0−1

E

"N−1 X

t=0

βtg(xt, ut, zt) + βNJ(xN)

# .

The base case N = 0 holds trivially as F0J(x0)≡ J(x0). For the induction step, suppose the result holds for some N− 1 ≥ 0. Then by Equation (3.3),

FNJ(x0) = F

FN−1J(x0)

= sup

u0

Eh

g(x0, u0, z0) + βFN−1J f (x0, u0, z0)i

1A point x∈ X is called a fixed point under a function f : X → X if x = f(x).

(17)

and invoking the induction hypothesis yields FNJ(x0)

= sup

u0

E

"

g(x0, u0, z0) + β sup

{ut+1}Nt=0−2

E (N−2

X

t=0

βtg(xt+1, ut+1, zt+1) + βN−1J(xN) )#

= sup

{ut}N−1t=0

E

"N−1 X

t=0

βtg(xt, ut, zt) + βNJ(xN)

# .

So the result also holds for N which proves the result. Now, since J is bounded by assumption and β ∈ (0, 1), we necessarily have that βNJ(xN) → 0 as N → ∞. It follows that

FNJ(x0) → sup

{ut}t=0 E

" X

t=0

βtg xt, ut, zt#

= J(x0) as N → ∞.

Thus, J(x) = limN→∞FNJ(x) as required.

This allows us to at least approximate J(x). Moreover, an immediate consequence of Theorem 3.2 is that the Bellman equation is indeed a sufficient condition within the class of bounded functions, which takes care of our second concern.

Theorem 3.3. J(x) is the unique bounded solution to the Bellman equation.

Proof (Bertsekas, 2007, p. 12). Suppose J is a bounded function that satisfies the Bell- man equation. Then J(x) = F J(x) and it follows that J(x) = limN→∞FNJ(x). By Theorem 3.2, we have J(x) = J(x).

Having shown that there is just one bounded solution to the Bellman equation, we can state the necessary and sufficient condition for optimality, which in turn shows that we can solve the dynamic programming problem by solving the Bellman equation.

Theorem 3.4 (Necessary and sufficient condition for optimality). The policy π = {πt}t=0 is optimal and u = {ut}t=0 is an optimal control to the infinite-time dy- namic programming problem if and only if u solves the Bellman equation for the value function J.

Proof. Suppose (x, u) is an admissible pair that solves the Bellman equation (3.2) for the value function J:

J(xt) = Eh

g(xt, ut, zt) + βJ f (xt, ut, zt)i

for all t = 0, 1, 2, . . . .

(18)

Extension to infinite time 13 By recursion over t, we then have

J(x0) = E



g(x0, u0, z0) + β

g(x1, u1, z1) + βJ f (x1, u1, z1)

= . . .

= E

" T X

t=0

βtg(xt, ut, zt) + βT +1J f (xT, uT, zT)# .

Since J is bounded by Assumption 3.1, βT +1J(xT +1)→ 0 as T → ∞. It follows that

J(x0) → E

" X

t=0

βtg xt, ut, zt#

as T → ∞,

so usolves the dynamic programming problem. Conversely, suppose that the admissible pair (x, u) solves the dynamic programming problem. We then know by Lemma 2.1 that {ut+s}s=0 solves the subproblem starting in period t with initial state xt. Using Lemma 2.1twice yields

J(xt) = E

"

g(xt, ut, zt) + β X s=0

βsg(xt+s+1, ut+s+1, zt+s+1)

#

= Eh

g(xt, ut, zt) + βJ f (xt, ut, zt)i .

Since the value function satisfies the Bellman equation by Theorem 3.1, it follows that u solves the Bellman equation.

3.3 Solving the Bellman equation

The necessary and sufficient condition for optimality presents us with a method to solve the dynamic programming problem: by solving the Bellman equation. In principle this is easy. However,Theorem 3.4only applies if we already know the value function J. As previously pointed out, this is generally not the case, and the key concern when solving these problems is thus to find the value function, We therefore finish off this section by discussing the methods used in this regard. In particular, Ljungqvist and Sargent (2012, ch. 3.1.1) list three main types of computational methods to find the value function:

(I). Guess and verify. The first method involves guessing (a bounded) J and verifying that it is indeed a solution to the Bellman equation (3.2). This method relies onTheorem 3.3; if we find a bounded function that satisfies the Bellman equation, then it has to be the value function. However, as this method depends on luck in making a good guess, it is of limited practical use.

(II). Value function iteration. A second method, value function iteration, applies the mapping F defined byEquation (3.3)to construct a sequence of value functions and corresponding controls by iteration as follows:

(19)

(i) Start with some bounded function J0 : X → R, for instance the zero function J0(x) = 0 for all x∈ X.

(ii) At each iteration n, calculate Jn+1(x) = F Jn(x). That is, let Jn+1(x) = sup

u∈U(x)Eh

g(x, u, z) + βJn f (x, u, z)i .

ByTheorem 3.2, we know that FnJ0(x)→ J(x) as n→ ∞ for any bounded J0, so the constructed sequence of value functions is guaranteed to converge to J.

(III). Policy function iteration. The last method, policy function iteration, is similar to value function iteration and uses the same mapping F , but it iterates over feasible policies instead of the value function. It consists of the following steps:

(i) Start with some feasible policy π0 : X→ U such that π0(x)∈ U(x) for all x ∈ X.

(ii) Policy evaluation: at the start of each iteration n, calculate the value Jn of the chosen policy πn(x):

Jn(x) = E

" X

t=0

βtg xt, πn(xt), zt#

with xt+1= f xt, πn(xt), zt

and x0 = x.

(iii) Policy improvement: generate a new policy πn+1(x) ∈ U(x) that satisfies the mapping F for Jn. That is, let

πn+1(x) = arg max

π(x)∈U(x) E



g x, π(x), z

+ βJn

f x, π(x), z .

This algorithm generates better and better policies: Jn+1 ≥ Jn for all n. In the limit, Jn → J and the corresponding policy is optimal. To see this, we follow Voorneveld (2016, p. 149) and define the function Fn: X→ X as

FnJ(x) = E



g x, πn(x), z

+ βJ

f x, πn(x), z . This is a monotonic function; suppose J(x)≥ V (x), then

FnJ(x) = E



g x, πn(x), z

+ βJ

f x, πn(x), z

≥ E



g x, πn(x), z

+ βV

f x, πn(x), z

= FnV (x).

After the policy evaluation and policy improvement we then have, respectively, FnJn(x) = E



g x, πn(x), z

+ βJn

f x, πn(x), z

= Jn(x),

Fn+1Jn(x) = E



g x, πn(x), z

+ βJn

f x, πn+1(x), z .

(20)

Extension to infinite time 15 Since πn+1(x) is chosen optimally, we necessarily have that Fn+1Jn(x) ≥ FnJn(x).

Monotonicity of Fn+1 implies Fn+1k Jn(x) ≥ FnJn(x) for all k, where Fn+1k is the com- position of Fn+1 with itself k times. By recursion over k, we can write Fn+1k as

Fn+1k Jn(x) = E

"k−1 X

t=0

g xt, πn+1(xt), zt

+ βkJn(xk)

# ,

and we therefore have Fn+1k Jn(x)→ Jn+1(x) in the limit. This shows that Jn+1 ≥ Jn. By Assumption 3.1, Jn is bounded, and in particular, it is bounded from above by J due to Definition 3.2. The algorithm therefore constructs a sequence {Jn}n=0 that is increasing and bounded from above: it must converge. In the limit, πn(x) = πn+1(x) and so

Jn(x) = E



g x, πn(x), z

+ βJn

f x, πn(x), z

= E



g x, πn+1(x), z

+ βJn

f x, πn+1(x), z

= sup

π(x)∈U(x)E



g x, π(x), z + βJn



f x, π(x), z .

Thus, Jn is a bounded function that satisfies the Bellman equation. ByTheorem 3.3it follows that Jn= J.

(21)

Having laid out the necessary framework for dynamic programming in finite and infinite time, we now motivate the use of the theory by applying it to a number of problems concerning economic theory. Below we cover four examples. In the first two, we use dynamic programming simply as a tool to derive two classic results in economics: Milton Friedman’s permanent income hypothesis and James Tobin’s q-theory of investment.1 In the last two examples we turn more practical and use dynamic programming to find explicit solutions to two standard problems: McCall’s (1970) job search model, where an unemployed worker must decide on the optimal timing to accept a job offer, and Brock and Mirman’s (1972) stochastic growth model, where a benevolent social planner must decide on the welfare-maximising allocation of consumption, investment, and labour supply.

4.1 The permanent income hypothesis

The permanent income hypothesis concerns the issue of dividing consumption between the present and the future for a household facing an uncertain income stream. The hypothesis was first discussed by Fisher (1930) and Friedman (1957), while Hall (1978) formalised it mathematically. According to the hypothesis, households smooth consump- tion across time by forming expectations of their total lifetime income (or permanent income) and then setting current consumption as an appropriate fraction of that income.

We illustrate this hypothesis with the use of dynamic programming below.

Consider an infinitely-lived household that gets utility from consumption c according to a function u(c) which is strictly increasing, continuously differentiable, and strictly concave. We naturally restrict consumption to be non-negative: c≥ 0. In each period t, the household earns a random wage wtwhich is independent over time and drawn from the set W = {w1, . . . , wN}. The household wishes to maximise the expectation of its discounted lifetime utility given by

E0

" X

t=0

βtu(ct)

# ,

where β ∈ (0, 1) is a subjective discount factor and E0 denotes the expectation over wt, conditioned on the information available in time 0. The expectation is necessary here because future values of consumption are stochastic as they depend on the wage realisations wt. Lastly, the household also has the opportunity to lend and borrow assets

1Both of which received the Nobel Memorial Prize in Economic Sciences partly due to these results.

16

(22)

Applications in economics 17 at freely at some constant interest rate r. For all t, this yields the per-period budget constraint

at+1+ ct = (1 + r)at+ wt. (4.1) Starting in period t and using recursion forward, we can rewrite Equation (4.1)as

X s=0

 1 1 + r

s

ct+s = X s=0

 1 1 + r

s

wt+s+ at,

and since ct≥ 0 and wt≥ min W for all t, we therefore have the borrowing constraint at ≥ −

X s=0

 1 1 + r

s

minW ≡ −b.

It follows fromEquation (4.1)that at+1∈

− b, (1 + r)at+ wt

and ct∈

0, (1 + r)at+ wt+ b

. Hence, we are optimising a continuous function over a non-empty, compact, and convex set, so a maximum exists by Weierstrass’ maximum theorem. By strict concavity of u(·), this maximum is unique. For a0 and w0 given, we can therefore write the household optimisation problem as

{cmaxt}t=0

E0

" X

t=0

βtu(ct)

#

subject to ct∈

0, (1 + r)at+ wt+ b

, t = 0, 1, 2, . . . , wt∈ W, t = 0, 1, 2, . . . , at+1= (1 + r)at+ wt− ct, t = 0, 1, 2, . . . , a0 and w0 given.

Conversely, by Equation (4.1)we can write ct= (1 + r)at+ wt− at+1, substitute for ct in u(ct), and maximise over at+1 instead of ct. It turns out that this approach is easier here. We thus write the Bellman equation for this problem as

J(at, wt) = max

at+1∈[−b,(1+r)at+wt]Et

 u

(1 + r)at+ wt− at+1



+ βJ(at+1, wt+1)

 . Differentiating the right-hand side of the Bellman equation, the first-order condition for an internal solution of the maximisation problem gives

−u0

(1 + r)at+ wt− at+1

 + βEt

∂J(at+1, wt+1)

∂at+1



= 0, where u0 denotes the derivative of u. Now, by definition we have

J(at, wt) = max

{at+s+1}s=0

Et

" X

s=0

βsu

(1 + r)at+s+ wt+s− at+s+1

# , so

∂J(at, wt)

∂at = (1 + r)u0

(1 + r)at+ wt− at+1

 .

(23)

Substituting this into the first-order condition and rearranging terms, we get the neces- sary optimality condition

u0

(1 + r)at+ wt− at+1



= β(1 + r)Et

 u0

(1 + r)at+1+ wt+1− at+2



. (4.2) We can solveEquation (4.2)using policy function iteration: conjecture the policy at+2= π0(at+1) such that for at and wtgiven, Equation (4.2)becomes a function in at+1 only.

Solving for at+1 yields a solution to the right-hand side of the Bellman equation, so this solution gives a policy update at+1= π1(at). Iterating until convergence yields the optimal policy. Moreover, by the budget constraint (4.1), we can writeEquation (4.2) as

u0(ct) = β(1 + r)Et

h

u0(ct+1)i

. (4.3)

This equation, called the consumption Euler equation, is the cornerstone of modern macroeconomics2and captures precisely the permanent income hypothesis: consumption today is not only based on current income but on the expected income and consumption in future years, and households smooth consumption across time accordingly.

4.2 Optimal investment and Tobin’s q

The “q-theory of investment” is a canonical model of firm investment first introduced by Tobin (1969), where q is the ratio of the average market value of a firm’s capital to its replacement cost. Tobin argues that investment is positively related to this q: if q is greater than one, capital has more value within the firm than outside it, so the firm should invest in more capital, and vice versa if q is less than one. Hayashi (1980) later established that investment in fact should relate to a marginal q. That is, q is the ratio of the market value of new additional capital to its replacement cost. It turns out that we can derive Tobin’s marginal q using dynamic programming.

Consider a firm which uses capital K to earn revenue according to some function π(K) which is strictly increasing, continuously differentiable, strictly concave, and bounded from above. We naturally restrict capital to be non-negative: K ≥ 0. In each period, there is a random demand shock z which is independent over time, drawn from the set Z = {z1, . . . , zN}, and adjusts revenues π linearly. The firm can choose to invest some amount I in new capital each period at a constant price p. In addition to the direct cost pI of investment, investment is also subject to adjustment costs captured by the function Φ(I), where Φ is strictly increasing, continuously differentiable, and strictly convex. We additionally assume that Φ(0) = Φ0(0) = 0, where Φ0 denotes the derivative of Φ. The firm seeks to maximise the expectation of discounted total profits given by

E0

" X

t=0

 1 1 + r

t

ztπ(Kt)− pIt− Φ(It)# ,

where r is a constant interest rate andE0 denotes the expectation over zt, conditioned on the information available in time 0. Lastly, capital is assumed to depreciate at a rate

2Indeed, Ljungqvist and Sargent (2012) call it “the common ancestor” of all macroeconomics.

(24)

Applications in economics 19 δ∈ (0, 1) such that Kt+1= (1− δ)Kt+ It. By non-negativity of Kt, we then necessarily have It∈ [−(1 − δ)Kt, +∞). For K0 and z0 given, this yields the optimisation problem

{Imaxt}t=0 E0

" X

t=0

 1 1 + r

t

ztπ(Kt)− pIt− Φ(It)# subject to It∈

− (1 − δ)Kt, +∞

, t = 0, 1, 2, . . . ,

zt∈ Z, t = 0, 1, 2, . . . ,

Kt+1= (1− δ)Kt+ It, t = 0, 1, 2, . . . , K0 and z0 given.

As in the case of the permanent income hypothesis, we can write It= Kt+1− (1 − δ)Kt, substitute for Itin the profit function, and optimise over Kt+1instead of It. Per-period profits are then ztπ(Kt)− p Kt+1− (1 − δ)Kt

− Φ Kt+1− (1 − δ)Kt

. Moreover, by boundedness of π we necessarily have that K≡ arg maxK≥0 π(K)− pδK

is finite. For any Kt∈ [0, K], it can never be optimal to choose Kt+1> K since by reducing capital to Kt+1= K, the firm attains a higher expected revenue tomorrow at a lower investment cost today. Thus, without loss of generality we can assume that Kt+1∈ [0, K]. Again, we are maximising a continuous and strictly concave function over a non-empty, compact, and convex set, so a unique maximum exists. It follows that the Bellman equation is

J(Kt, zt) = max

Kt+1∈[0,K] Et



ztπ(Kt)− p Kt+1− (1 − δ)Kt

− Φ

Kt+1− (1 − δ)Kt



+ 1

1 + rJ(Kt+1, zt+1)

 .

The solution procedure is completely analogous to the case of the permanent income hypothesis. The first-order condition for an internal solution of the right-hand side is

−p − Φ0 Kt+1− (1 − δ)Kt

+ 1

1 + rEt

∂J(Kt+1, zt+1)

∂Kt+1



= 0.

By the definition of the value function,

∂J(Kt, zt)

∂Kt = ztπ0(Kt) + (1− δ)

p + Φ0 Kt+1− (1 − δ)Kt

,

where π0 denotes the derivative of π. Substituting this into the first-order condition, rearranging terms, and using It= Kt+1− (1 − δ)Kt, we get the optimality condition

p + Φ0 It

= 1

1 + rEt



zt+1π0(Kt+1) + (1− δ)

p + Φ0 It+1

. (4.4)

Again, this can be solved using policy function iteration to find the optimal investment.

Moreover, if we define

qt ≡ 1 p

1 1 + rEt

∂J(Kt+1, zt+1)

∂Kt+1

 !

References

Related documents

The example above represents one interpretation of an optimal stopping problem in microeconomics, which could be solved by using DP. Another problem of basically the same kind, is

In Performative Design Studio taught by Ulrika Karlsson and Jonah Fritzell, in which I have been studying for one year before taking on my thesis project, I have been working

This study adopts a feminist social work perspective to explore and explain how the gender division of roles affect the status and position of a group of Sub

This is valid for identication of discrete-time models as well as continuous-time models. The usual assumptions on the input signal are i) it is band-limited, ii) it is

Second, Olaf and Kunt's students also had opportunity to bring forth personal meaning and knowledge they had prior to participation in this sphere of practice

Since public corporate scandals often come from the result of management not knowing about the misbehavior or unsuccessful internal whistleblowing, companies might be

Key words: Bosonization, Exactly solvable models, Hubbard model, Mean field theory, Quantum field theory, Strongly correlated

Keywords: Real-time systems, Scheduling theory, Task models, Computational complexity Pontus Ekberg, Department of Information Technology, Computer Systems, Box 337, Uppsala