• No results found

Maskininlärning och kvantmekanik

N/A
N/A
Protected

Academic year: 2022

Share "Maskininlärning och kvantmekanik"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

IN

DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2019,

Machine learning and quantum mechanics

NATALIA ERMAKOVA ALICIA BRÅTNER

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES

(2)

IN

DEGREE PROJECT TEKNIK, FIRST CYCLE, 15 CREDITS STOCKHOLM SWEDEN 2019,

Maskininlärning och kvantmekanik

NATALIA ERMAKOVA ALICIA BRÅTNER

KTH ROYAL INSTITUTE OF TECHNOLOGY SKOLAN FÖR TEKNIKVETENSKAP

(3)

Abstract

This thesis aims to use machine learning to solve for the ground state energy of the quantum system corresponding to the particle in a box. A radial basis function (RBF) network is used with Gaussian functions as the variational wave function. The weights in the network are updated so that the energy expectation value is minimized, which is carried out by using the variational Monte Carlo (VMC) method. The method using machine learning succeeds in finding the ground state energy for the particle in a box.

The method also works when a perturbation in the form of a linear potential is added to

the infinite potential well.

(4)

Sammanfattning

Syftet med rapporten är att redovisa för hur maskininlärning kan användas för att ap- proximera grundtillståndsenergin av kvantsystemet som motsvaras av partikeln i en låda.

Ett nätverk med radialbasfunktioner har använts med nätverket som en representation

av variationsvågfunktionen. Vikterna i nätverket har uppdaterats så att väntevärdet av

energin minimeras. Energiminimeringen har utförts med hjälp av variations-Monte Carlo-

metoden. Lösningsmetoden som presenteras har gett en bra approximation för grundtill-

ståndsenergin av partikeln i en låda. Metoden fungerar också när en störning i form av

en linjär potential är tillagd till systemet.

(5)

Contents

1 Introduction 2

2 Background Material 3

2.1 Artificial neural network . . . . 3

2.2 Particle in a box . . . . 4

2.3 Quantum-mechanical expectation values . . . . 5

2.4 Metropolis algorithm . . . . 6

2.5 Pseudo inverse . . . . 6

2.6 Regularization . . . . 6

2.7 Variational Monte Carlo method . . . . 7

3 Investigation 9 3.1 Model . . . . 9

3.2 Results . . . . 11

3.2.1 The effect of the learning rate . . . . 12

3.2.2 The effect of the number of hidden layer neurons . . . . 13

3.2.3 Perturbation with linear potential . . . . 14

3.3 Discussion . . . . 15

4 Summary and Conclusions 16

5 Acknowledgements 17

Appendices 18

(6)

Chapter 1 Introduction

Machine learning is a rapidly developing field in research and industry. Recently it has also been applied to study physical phenomena [4][1]. Applying machine learning methods within quantum mechanics is therefore an interesting and timely problem.

Machine learning and artificial neural networks in particular can be used to solve problems within quantum mechanics. This is shown in research papers, some of which are used as a reference in this thesis. In Teng’s paper it is demonstrated how a RBF network can be used to find the ground state energy of a particle in a box and a harmonic oscillator [9]. Teng shows that the ground state energy of a particle can be found by using an artificial neural network and a variational method to approximate the ground state energy. In his paper Teng demonstrates that parameters of the variational wave function can be used as a parameters of the neural network and thus adjusted in the process of training the network. The method described in Teng’s paper is used in this thesis.

Another example is Carleo and Troyer’s research article which describes the advan- tages of using machine learning when solving quantum many-body problems [2]. Carleo and Troyer discuss the difficulties in finding the ground state energy for a many-body system with interacting fermions and the possibility to reduce the exponential complexity by using machine learning.

This thesis focuses on using artifical neural networks (ANN) as a tool to solve simple

problems within quantum mechanics such as finding the ground-state energy of a particle

in an infinite square well. The possibility of obtaining a correct solution is investigated

as well as the precision of the solution compared to other methods of solving the same

problem. The effect of the parameters of the network is also examined.

(7)

Chapter 2

Background Material

2.1 Artificial neural network

The main idea behind an artificial neural network is to mimic a biological brain, that is, being able to recognize patterns and learn by training [10]. The name artificial neural network (ANN) acts as a collective name for a variety of algorithms that do this. Common to most algorithms is that they consist of neurons, which carries a numerical value, and which are interconnected via synaptic couplings. The strength of the couplings are modeled by weight functions and the learning process corresponds to adjusting the weights. As in a biological brain, different neurons will be activated for different stimuli or inputs, where the degree of activation is determined by the numerical value of the weight function [7].

The numerical value in each neuron is calculated according to an activation function, and depends on the values of the previous neurons and the weights between them. A neural network often has at least three layers of neurons, an input layer, a hidden layer and an output layer as seen in Fig. 2.1. In a feed forward network, the calculations are carried out in an order starting from the input layer and ending in the output layer. In Fig. 2.1, n represents the input to the network and c

i

the weights between the input layer and the hidden layer. The activation functions are denoted ρ

i

which determines the value of the hidden neurons. Further, a

i

represent the weights between the hidden layer and output layer, and the sum represents the output value as a weighted sum of the previous weights and neurons, so that

output =

n

X

i

a

i

ρ

i

. (2.1)

Equation 2.1 is valid for radial basis function (RBF) networks. A RBF network uses

activation functions that only depends on the distance from a certain point, for example

Gaussian functions. Other types of neural networks can generally look in many other

ways with different relations between the neurons. For example, it is common to use

multiple input and output neurons or several hidden layers. The activation function

also varies between different networks. In order for the neural network to learn it is

also equipped with a learning rule which aims to change the network’s input and output

behavior. More concretely, the learning rule adjusts the weights of the network. Each

time the network is fed with information the learning rule will be applied so that the

(8)

weights change to the next run. The network can then be iteratively trained until the desired behavior is achieved [7].

The learning rate is a hyperparameter, i.e. a parameter that is set to a certain value before the training. The learning rate determines how much the parameters of the network are adjusted in each iteration [11]. It is important to choose an optimal learning rate because too small value can lead to a slow performance of the network while too large value can cause instability in the convergence and uncertain results.

Figure 2.1: Illustration of a neural network.

2.2 Particle in a box

The one-dimensional model of the "particle in a box" describes a particle within a well defined by infinite potential walls on each side. It is used to illustrate quantum mechanical phenomena. The potential inside the well is zero, so the particle is free to move. A mathematical model of the potential for a well of length L is given by

V (x) =

( 0, if 0< x < L,

∞, otherwise. (2.2)

Since the walls are impenetrable the wave function is ψ = 0 outside the well. By solving the time-independent Schrödinger equation,

− ~

2

2m

2

ψ(x)

∂x

2

+ V (x)ψ(x) = Eψ(x) (2.3)

and using continuity at both ends of the well, ψ(0) = ψ(L) = 0, and normalizing ψ the solutions read

ψ

n

(x) = r 2

L sin  nπ L x 

, (2.4)

with eigenvalues

E

n

= n

2

π

2

~

2

2mL

2

, (2.5)

(9)

where m is the mass of the particle. In natural units and with length L = 1 and m = 1 Eq. 2.6 simplifies to

E

n

= n

2

π

2

2 . (2.6)

The ground state energy is then given by E

1

= π

2

/2, with the 10 first decimals given by E

1

= 4.9348022005.

This thesis also considers a particle in a box with a perturbation in the form of a linear potential. With the potential, the Hamiltonian reads

H = p ˆ

2

2 + V (x) + aˆ x = ˆ H

0

+ aˆ x (2.7) where a is a force constant of the perturbation. The first order perturbation theory is going to be used. Thus, the energy with perturbation will take the form:

E

n

= n

2

π

2

2 + hn| ax |ni = n

2

π

2

2 + aL

2 (2.8)

2.3 Quantum-mechanical expectation values

Generally, the expectation value of an observable A is given by:

hAi = R

−∞

Ψ

AΨdx R

−∞

Ψ

Ψdx = hΨ|AΨi

hΨ|Ψi (2.9)

where Ψ is a wave function that is not normalized [5].

In this thesis the energy of a particle in a box with infinite potential walls is studied.

Time-independent Schrödinger equation can be written as:

Hψ = Eψ ˆ (2.10)

Therefore, values of E can be seen as eigenvalues of ˆ H and ψ as eigenfunctions of ˆ H.

As seen in Eq. 2.6, the Hamiltonian of the particle in a box with infinite potential walls has discrete eigenvalues. Therefore it can be of interest to look at expectation values of operators with a discrete spectrum, where the expectation value for an operator A in a general state

|ψi = X

n

c

n

|f

n

i = X

n

hf

n

|ψi |f

n

i (2.11)

with normalized eigenstates |f

n

i and eigenvalues a

n

is given by hAi = X

n

a

n

| hf

n

|ψi |

2

(2.12)

where P

n

= | hf

n

|ψi |

2

is the probability to find the system in state f

n

when measuring

A [5].

(10)

2.4 Metropolis algorithm

The Metropolis algorithm is a Markov chain Monte Carlo method which can be used to compute expressions of the form:

hf i = R f (x)p(x)dx

R p(x)dx . (2.13)

The algorithm involves constructing a random walk i.e. a sequence of randomly chosen points (steps) {x

i

} such that their asymptotic probability distribution approaches p(x) after many steps. These points construct a random walk that is defined by a transition probability T (x → x

0

) which is the probability of the random walk taking a step from x to x

0

. The common choice of transition probability is given by [6]:

T (x → x

0

) = min1, p(x

0

)

p(x) . (2.14)

After a random walk of N steps is generated, the expectation value is evaluated by:

hf i = 1 N

N

X

i0

f (x

i

). (2.15)

2.5 Pseudo inverse

The pseudoinverse or the Moore-Penrose matrix is a generalization of the inverse of a matrix. The pseudoinverse A

+

of a m × n-matrix A with entries in R or C must satisfy the following conditions.

1. AA

+

A = A 2. A

+

AA

+

= A

+

3. (AA

+

)

= AA

+

(* denotes conjugate transpose) 4. (A

+

A)

= A

+

A

The conditions guarantee existence and uniqueness of A

+

. It follows that if A is invertible, the pseudoinverse equals the inverse, A

+

= A

−1

. The Moore-Penrose matrix also allows an inverse for singular matrices. The numpy library contains a built-in pseudoinverse function which is calculated by using the singular value decomposition [8].

2.6 Regularization

As will be shown in Section 2.7, an inverse of the covariance matrix will be used to adjust the parameters of the network. The inverse of the covariance does not always exists and therefore the Moore-Penrose pseudoinverse will be used. However, even with the use of a pseudoinverse small values in the matrix diagonal can lead to unstable results in the energy values. Therefore a matrix regularization will be used, i.e. an additional term will be added to the matrix diagonal. Thus the elements in the matrix diagonal will be given by

S

ii0

= S

ii

+ r(k)S

ii

(2.16)

Here S denotes the covariance matrix and r(k) is a parameter which is a function of k,

the iteration number. [3]

(11)

2.7 Variational Monte Carlo method

The variational Monte Carlo (VMC) method is a way to find an approximation of the ground state energy of a system. In this method an ansatz for the wave function is used.

It is assumed that the wave function is equal to some function

ψ(λ) , where λ is a set of variational parameters. If ˆ H is the Hamiltonian operator the energy expectation value can be written as:

E(λ) = ψ(λ) ˆ H

ψ(λ) ψ(λ)

ψ(λ) (2.17)

This value can be approximated by using the Metropolis algorithm [9].

After the energy expectation value has been evaluated it is minimized by changing the parameters in the variational wave function. In this thesis the stochastic reconfiguration method will be used to adjust the parameters [3].

To use the stochastic reconfiguration method the variational derivatives need to be introduced:

O

i

(n) = ∂

λi

ψ

λ

(n)

ψ

λ

(n) . (2.18)

Here, n represent some configuration of the system and O

i

(n) is the derivative of the wave function ansatz with respect to the ith term, divided by the same ansatz.

To update the parameters the covariance matrix S and the forces F need to be introduced. They are defined by:

S

ij

= hO

i

O

j

i − hO

i

ihO

j

i (2.19) F

i

= hE

local

O

i

i − hE

local

ihO

i

i (2.20) The elements in the S-matrix represents the covariance between O

i

and O

j

, that is, the linear relationship between them. F kan be interpreted as a force that pulls the param- eters towards the energy minimum. In these equations h. . . i denotes the expectation values for these derivatives which is also evaluated using the Metropolis algorithm [9].

The local energy is introduced and defined by E

local

= hn| H

ψ(λ) n

ψ(λ) = P

n0

hn| H |n

0

i n

0

ψ(λ) n

ψ(λ) (2.21)

where the matrix elements for the particle in the box are [9]

hn| ax n

0

=

a

(n−n4[(−1)0)n+n02(n+n−1]nn0)2π20

, if n 6= n

0

0.5a, if n = n

0

(2.22)

A regularization is used for the diagonal elements of the matrix S

S

ii0

= S

ii

+ r(k)S

ii

(2.23)

Here, k is the number of the current iteration and r(k) is given by

r(k) = max(100 · 0.9

k

, 10

−4

) (2.24)

(12)

The parameters is then updated according to the stochastic reconfiguration method

λ

0j

= λ

j

− αS

ij−1

F

i

(2.25)

Here, α represent the learning rate of the network and S

ij−1

is the Moore-Penrose pseudoinverse since the covariance matrix is not always invertible.

As a result the values of each parameter of the variational wave function will be

updated, with λ

0j

as the element with index j of the new parameter and λ

j

as the old. The

vector F can be seen as a force that pulls the parameters towards the energy minimum,

and becomes 0 when the minimum has been found.

(13)

Chapter 3 Investigation

3.1 Model

This thesis concentrates on neural networks consisting of three layers of neurons, an input layer, a hidden layer and an output layer seen in Fig. 2.1. The neural network will also consist of one input neuron, one output neuron, but different numbers of hidden neurons will be examined. From a given input n in the input layer, the values of the other neurons are calculated as follows. The value for each neuron in the hidden layer is calculated using an activation function, ρ

i

which depends on the variable n and the parameter c

i

.

As stated before, the output of the neural network is given by

output =

M

X

i

a

i

ρ

i

(3.1)

where M is the number of neurons in the hidden layer. In this thesis Gaussian functions are used as activation functions, therefore

ρ

i

(kx − c

i

k) = e

−|bi||x−ci|2

(3.2) The ansatz of the amplitude function is set to

ψ(n

1

, n

2

, . . . , n

p

, a, b, c) =

M

X

i

a

i

e

−|bi||n−ci|2

(3.3)

Thus, a

i

, b

i

, c

i

are the the ANN parameters, which from now on will be denoted λ.

They can also be seen as the variational wave parameters in the VMC method. Thus, the parameters a

i

, b

i

, c

i

in a variational wave function become parameters of the neural network which are associated with weights between the neurons of the network.

Since the one-dimensional particle in a box is considered, n and c

i

become scalars.

Furthermore, n is restricted by 1 ≤ n ≤ n

max

= 20 in order to avoid an infinite sum.

Now the wave function can be expressed as a linear combination of the energy eigenbasis

|ni:

|ψi =

20

X

n=1

ψ(n) |ni (3.4)

(14)

where ψ(n) is the amplitude function in Eq. (3.3). Therefore, the RBF-network can be seen as a representation of the amplitude function.

The amplitude function can now be used to estimate the energy value. As previously mentioned:

E(λ) = ψ(λ) ˆ H

ψ(λ) ψ(λ)

ψ(λ) = P

n,n0

hψ|ni hn| ˆ H |n

0

i hn

0

|ψi

hn|ψihn|ψi

P

n

hψ|ni hn|ψi ) =

= P

n

ψ(n, λ)

2

E

local

(n, λ) P

n

ψ(n, λ)

2

=

R ψ(n, λ)

2

E

local

(n, λ)dn R ψ(n, λ)

2

dn

= hEi

(3.5)

By comparing with Eq. 2.15 it is easy to see that the energy expectation value can be evaluated using the Metropolis algorithm. When using the Metropolis algorithm in this case, a random move from n to n

00

= n ± 1 is performed. The number of samples is set at 10000. The move is either accepted or rejected with a transition probability T (n → n

00

) given by:

T (n → n

00

) = min

 1,

n

00

ψ(λ)

2

n

ψ(λ)

2



(3.6)

After the move has been determined, the values of the amplitude function ψ are calculated for this quantum number using Eq. (3.3). The local energy can be decided using the formula:

E

local

(n, λ) = hn| ˆ H |ψi hn|ψi =

P

n0

hn| ˆ H |n

0

i hn

0

|ψi

hn|ψi . (3.7)

The Hamiltonian in the unperturbed system is ˆ H

0

=

pˆ22

+ V (x). Since |ni are the energy eigenbasis with eigenvalues E

n

and it follows that hn|n

0

i = δ

nn0

, E

local

can be simplified as follows:

E

local

(n, λ) = P

n0

hn| ˆ H

0

|n

0

i hn

0

|ψi

hn|ψi =

P

n0

E

n0

hn|n

0

i hn

0

|ψi

hn|ψi = E

n

hn|ψi

hn|ψi = E

n

= n

2

π

2

2

(3.8)

A different result is obtained for the perturbed system with Hamiltonian ˆ H = ˆ H

0

+aˆ x.

With again the unperturbed energy eigenbasis:

E

local

(n, λ) = P

n0

hn| ˆ H

0

+ aˆ x |n

0

i hn

0

|ψi

hn|ψi = E

n

+

P

n0

hn| aˆ x |n

0

i hn

0

|ψi

hn|ψi (3.9)

where hn| aˆ x |n

0

i is calculated according to Eq. (2.10).

The stochastic reconfiguration operators for each parameter can also be calculated using the ψ obtained in this Metropolis step and the current parameters of the network using Eq. 2.19:

O

ai

(n) = ρ

i

ψ (3.10)

(15)

O

bi

(n) = − a

i

b

i

|n − c

i

|

2

|b

i

| ψ (3.11)

O

ci

(n) = 2a

i

|b

i

| (n − c

i

i

ψ (3.12)

By using the above equations the expectation values hO

i

O

j

i, hO

i

i, hO

j

i, hE

local

O

i

i are calculated, which are then used to calculate the covariance matrix and the force vector according to Eqs. (2.20) and (2.21.) The expectation values can be calculated either using the Metropolis method or the exact method. In this thesis mainly calculations of expectation values are presented using the Metropolis method. However, by following the same procedure as above but by exchanging the Metropolis calculations of the expectation values by the exact calculations, the correctness can be controlled. The exact expectation values are calculated according to Eq. (2.14):

hO

i

i = P

20

n=1

O

i

(n)| hn|ψi |

2

P

20

n=1

| hn|ψi |

2

, (3.13)

where n is truncated at n

max

= 20 and | hn|ψi |

2

= |ψ(n)|

2

is the (unnormalized) proba- bility to find the particle in state |ni. The sum in the denominator corresponds to the normalizing constant.

Given that in this case the operators are always real, it follows that O

i

= O

i

for all i.

The values of O

i

, and O

i

O

j

are calculated for each parameter of the network and stored in a vector and a matrix respectively.

As the last step for an iteration on the network, each parameter is updated according to Eq. (2.25).

It is expected that by performing the operations described above for a sufficient amount of iterations the energy will be minimized and a correct result will be obtained.

3.2 Results

The results obtained with random initial values of network parameters, M hidden layer

neurons and learning rate given by α = 0.05 were satisfactory and gave an answer that

was close enough to the analytical value of the ground state energy. The results of some

simulations are presented in the figures below. It should also be mentioned that large

energy values were cut off and set to a constant value, in order to see details in the

convergence. The graphs presented are a selection of many other graphs with different

start parameters. The energy on the vertical axis is presented in natural units.

(16)

(a) Energy convergence obtained with Metropolis method.

(b) Energy convergence obtained with the exact calculations.

Figure 3.1: The convergence of the ground state energy

As can be seen in Fig. 3.1 above the value of the ground state energy converges to the exact value of E

1

= 4.9348022005. It is worth noting is that a spike appears in the top graph.

Some problems with the covariance matrix S were encountered. For some values of the initial parameters numpy failed to obtain the pseudoinverse of the covariance matrix and the program crashed. This behaviour will be discussed in Section 3.3.

3.2.1 The effect of the learning rate

The convergence for 4 different learning rates α can be seen in Fig. 3.2. As previously

mentioned, large energy values are cut off at E = 15 for the sake of convenience. It can

be seen that small values of α generally result in slower convergence. Large α instead

provides faster but more unstable convergence.

(17)

Figure 3.2: Ground state energy convergence for different learning rates α.

3.2.2 The effect of the number of hidden layer neurons

The convergence with M = 2, 4, 6, 10 number of hidden neurons can be seen in Fig. 3.3.

The energy converges for all different numbers of neurons, but is unstable for M = 10 neurons. It should also be noted that the graph in Fig. 3.3 was only one of many test runs with different start parameters. After many runs the number of M = 6 neurons was considered to provide a fast and stable convergence.

Figure 3.3: Ground state energy convergence for different numbers of hidden neurons M .

(18)

3.2.3 Perturbation with linear potential

Finally, the convergence of the linear potential was examined. The result is shown in Fig.

3.4, with a = 0, 2, 4, 8, −8. Here α = 0.05 and M = 6 neurons are used. The expectation values are calculated using the Metropolis method in the graph.

Figure 3.4: Energy convergence with linear perturbations a.

The energy values obtained analytically are taken from Teng’s paper [9] and presented in the table below. Only the exact expectation values are presented in the graph since the Metropolis method failed to generate a convergence with the same decimal sequence.

Even with the exact expectation values, it was difficult to get the same decimal sequence, but after much fine tuning of parameters (M and α) and many runs of the network, the values that appeared several times were used and can be seen in Tab. 3.1. The energy values are not rounded but cut after the fifth decimal. The energy value of the linear potential a=-8 is not included in the table because it converged to different energy values each time.

a Exact value VMC Exact

0.0 4.93480 4.93480

2.0 5.92603 5.92603

4.0 6.89974 6.89979

8.0 8.79508 8.79552

Table 3.1: The analytical values compared to the VMC-values of the ground state energy of the particle in the box with perturbation.

As can be seen in Fig. 3.4 and Tab. 3.1 the energy converges to values that correspond

to the exact values.

(19)

3.3 Discussion

Overall the constructed ANN provided results that were close to the ground state energy value obtained analytically. The behaviour of the model with respect to different linear potentials was also reasonable.

The problem with calculating the pseudoinverse of the covariance matrix was possibly due to some choices of initial parameters. For some choices the attempt to minimize the energy led to the wave function being equal to zero. This led to numpy evaluating the stochastic reconfiguration operators as "Not a number" since the stochastic reconfigura- tion operators was obtained by dividing with the wave function (Eqs. (3.10) - (3.12)).

This leads to the conclusion that the initial parameters should be chosen carefully in order to obtain the best results.

However, it can be stated that, regardless of the choice of starting parameters, the convergence still looks irregular with spikes. This can be seen both in the exact and the Metropolis convergence, though no figure of the exact energy convergence with spikes is included in this thesis. The Metropolis method, however, is much more uneven than the exact method, which is probably due to the stochastic nature of the Metropolis algorithm.

This leads to the suspicion that the problem has to do with the stochastic reconfiguration method Eqs. (2.19) - (2.25), which thus creates instability in some unresolved way.

As stated above, it was noted that with increasing learning rate the convergence was faster but less stable than for smaller values of α. By definition, the learning rate deter- mines how much the parameters are changed in each iteration. Therefore it should be expected that a larger α leads to faster convergence as then the network learns faster.

However this can also result in instability as then for example the spikes from the ran- domness of the Metropolis algorithm will affect the system to even greater extent. For the same reason smaller values of α provided a more stable curve but with slower convergence.

The effect of the number of hidden neurons can also be explained. As was mentioned in Sec. 2.1 the output of the ANN is given by output = P

M

i

a

i

ρ

i

, where M denotes the number of hidden layer neurons. As can be seen in this equation, the final output can be perceived as the summation of all the hidden-layer neurons. This can explain the slower convergence for smaller M : if the number of neurons is too small, then the output obtained in each iteration is based on too little amount of data for the network to learn effectively. For the same reason the performance improves for larger M . However, simi- larly to the case with larger learning rate, larger value of M leads to unstable behaviour of the model. This can be explained by the fact that more summands, i.e. more hidden neurons in the hidden layer, give greater value of ψ and hence E, which gives higher spikes.

As for the results with the perturbation, it can be stated that the values of the ground

state energy obtained with ANN correspond to the analytical values in perturbation

theory. This is a good sign because then it can be stated that the ANN gives a correct

result for different types of problems, and not only for the unperturbed particle in a box.

(20)

Chapter 4

Summary and Conclusions

The attempt to evaluate the ground state energy by minimizing the energy expectation value using the neural network gave satisfactory results close to the analytical values.

Such results were obtained even when a perturbation in the form of a linear potential was added to the model.

The results show that the method in the thesis is successful in solving problems within quantum mechanics and useful for at least simple problems with one particle. Perhaps, ANN can potentially be used in a similar manner to effectively solve problems with many particles.

However, it should be mentioned that the overall performance, e.g. stability and rate

of convergence of the neural network was greatly dependent on the initial parameters, the

learning rate and the amount of hidden layer neurons. Therefore, these variables should

be adjusted according to the current problem. For problems requiring a large number of

hidden neurons, this can be a problem as the time for each iteration in the network will

increase drastically.

(21)

Chapter 5

Acknowledgements

We acknowledge Prof. M. Wallin for his always equally helpful answers to all our ques-

tions and encouragement during the work.

(22)

Appendices

(23)

# −∗− c o d i n g : u t f −8 −∗−

# Using n e u r a l n e t w o r k s t o s o l v e f o r t h e ground s t a t e e n e r g y

# o f t h e one−d i m e n s i o n a l p a r t i c l e −i n a box

# A l i c i a B r t n e r and N a t a l i a Ermakova 2019−05−18

# S u p e r v i s e d by P r o f . Mats W a l l i n

# R e f e r e n c e : Peiyuan Teng , PRE 9 8 , 033305 ( 2 0 1 8 ) i m p o r t numpy a s np

from numpy . l i n a l g i m p o r t i n v i m p o r t m a t p l o t l i b

from m a t p l o t l i b i m p o r t p y p l o t a s p l t i m p o r t random

m a t p l o t l i b . r c ( ’ x t i c k ’ , l a b e l s i z e =20) m a t p l o t l i b . r c ( ’ y t i c k ’ , l a b e l s i z e =20) f o n t = { ’ f a m i l y ’ : ’ normal ’ ,

’ weight ’ : ’ normal ’ ,

’ s i z e ’ : 20}

m a t p l o t l i b . r c ( ’ f o n t ’ , ∗∗ f o n t )

# 2 n e u r a l n e t w o r k s o f d i f f e r e n t s i z e s and l e a r n i n g r a t e s c l a s s n e u r a l _ n e t w o r k ( o b j e c t ) :

d e f __init__ ( s e l f ) :

s e l f . h i d d e n s i z e = [ 6 , 6 ] # h i d d e n n e u r o n s [ e x a c t , m e t r o p o l i s ] s e l f . nmax = 20 # t r u n c a t e d e i g e n s t a t e s nmax = 20

s e l f . a l p h a = [ 0 . 0 5 , 0 . 0 5 ] # a l p h a l e a r n i n g r a t e [ e x a c t , m e t r o p o l i s ] s e l f . a = −8 # l i n e a r p o t e n t i a l

# w e i g h t v e c t o r s ( c i ) from i n p u t l a y e r t o h i d d e n l a y e r s e l f . C = [ np . random . random ( s e l f . h i d d e n s i z e [ 0 ] ) ,

np . random . random ( s e l f . h i d d e n s i z e [ 1 ] ) ]

# w e i g h t v e c t o r s ( a i ) from h i d d e n t o o u t p u t l a y e r s e l f .A = [ np . random . random ( s e l f . h i d d e n s i z e [ 0 ] ) , np . random . random ( s e l f . h i d d e n s i z e [ 1 ] ) ]

s e l f . B = 1 # b = 1 d e f g e t h i d d e n s i z e ( s e l f ) :

r e t u r n s e l f . h i d d e n s i z e d e f g e t a l p h a ( s e l f ) :

r e t u r n s e l f . a l p h a d e f g e t a ( s e l f ) :

r e t u r n s e l f . a

# g e n e r a t e s o u t p u t from n e u r a l network

# f e e d f o r w a r d = p s i f u n c t i o n

# r e t u r n s s c a l a r v a l u e p s i

(24)

d e f p s i ( s e l f , n , f l a g ) : p s i = 0 .

f o r M i n r a n g e ( 1 , s e l f . h i d d e n s i z e [ f l a g ] ) :

p s i += s e l f .A[ f l a g ] [M] ∗ s e l f . g a u s s ( n , s e l f . C [ f l a g ] [M] , s e l f . B) r e t u r n p s i

# a c t i v a t i o n f u n c t i o n

# r e t u r n s s c a l a r

d e f g a u s s ( s e l f , n , c , b ) :

r e t u r n np . exp(−np . abs ( b ) ∗ np . abs ( n−c ) ∗ ∗ 2 )

# r e t u r n s s c a l a r e l o c a l = e n e r g y f o r e i g e n s t a t e n d e f e _ l o c a l ( s e l f , n ) :

r e t u r n ( n ∗∗2∗ np . p i ∗ ∗ 2 ) / 2

# m e t r o p o l i s f u n c t i o n

# i n p u t i t e r a t i o n number , number o f i t e r a t i o n s , operator_a ,

# o p e r a t o r _ c , p s i , f l a g = 1 f o r m e t r o p o l i s e x p e c t a t i o n v a l u e s

# c a l c u l a t e s e x p e c t a t i o n v a l u e s and r e t u r n s ground s t a t e e n e r g y

# and m a t r i c e s S and F

d e f m e t r o p o l i s ( s e l f , k , i t e r , p s i , operator_a , o p e r a t o r _ c , f l a g = 1 ) : l = s e l f . h i d d e n s i z e [ f l a g ]

O = [ np . z e r o s ( l ) , np . z e r o s ( l )]# <Oi>

Eav = 0 . # <Eloc>

OO = [ np . z e r o s ( s h a p e =( l , l ) ) , np . z e r o s ( s h a p e =( l , l ) ) ] # <OiOj>

EO = [ np . z e r o s ( l ) , np . z e r o s ( l ) ] # <ElocOi>

S = [ np . z e r o s ( s h a p e =( l , l ) ) , np . z e r o s ( s h a p e =( l , l ) ) ] # S−m a t r i x F = [ np . z e r o s ( l ) , np . z e r o s ( l ) ]

n i = 1 # i n i t i a l v a l u e n f o r i t e r a i n r a n g e ( i t e r ) :

n j = n i − 1

i f random . random ( ) > 0 . 5 : n j = n i + 1

i f n j > 0 and n j < s e l f . nmax :

i f ( p s i [ n j ] / p s i [ n i ] ) ∗ ∗ 2 > random . random ( ) : n i = n j

E = s e l f . e _ l o c a l ( n i ) # E l o c a l p s i _ n i = p s i [ n i ]

i f s e l f . a != 0 : # i f l i n e a r p o t e n t i a l f o r n1 i n r a n g e ( 1 , s e l f . nmax ) :

i f n1 == n i :

E += 0 . 5 ∗ s e l f . a

e l i f np . mod( n1 + n i , 2 ) != 0 : E += −8.∗ s e l f . a ∗ n i ∗ n1 ∗ p s i [ n1 ]

/ ( ( ( n1−n i ) ∗ ∗ 2 ) ∗ ( ( n1+n i ) ∗ ∗ 2 ) ∗ ( np . p i ∗ ∗ 2 ) ∗ p s i _ n i ) Eav += E

f o r i i n r a n g e ( l ) :

(25)

O [ 0 ] [ i ] += o p e r a t o r _ a [ n i , i ] O [ 1 ] [ i ] += o p e r a t o r _ c [ n i , i ]

EO [ 0 ] [ i ] += E ∗ o p e r a t o r _ a [ n i , i ] EO [ 1 ] [ i ] += E ∗ o p e r a t o r _ c [ n i , i ]

f o r j i n r a n g e ( l ) :

OO [ 0 ] [ i , j ] += o p e r a t o r _ a [ n i , i ] ∗ o p e r a t o r _ a [ n i , j ] OO [ 1 ] [ i , j ] += o p e r a t o r _ c [ n i , i ] ∗ o p e r a t o r _ c [ n i , j ] Eav = Eav/ i t e r

O [ 0 ] = np . d i v i d e (O[ 0 ] , i t e r ) O [ 1 ] = np . d i v i d e (O[ 1 ] , i t e r ) EO [ 0 ] = np . d i v i d e (EO [ 0 ] , i t e r ) EO [ 1 ] = np . d i v i d e (EO [ 1 ] , i t e r ) OO[ 0 ] = np . d i v i d e (OO[ 0 ] , i t e r ) OO[ 1 ] = np . d i v i d e (OO[ 1 ] , i t e r )

f o r i i n r a n g e ( l ) :

F [ 0 ] [ i ] = EO [ 0 ] [ i ] − Eav ∗ O [ 0 ] [ i ] F [ 1 ] [ i ] = EO [ 1 ] [ i ] − Eav ∗ O [ 1 ] [ i ]

f o r j i n r a n g e ( l ) :

S [ 0 ] [ i , j ] = OO [ 0 ] [ i , j ] − O [ 0 ] [ i ] ∗ O [ 0 ] [ j ] S [ 1 ] [ i , j ] = OO [ 1 ] [ i , j ] − O [ 1 ] [ i ] ∗ O [ 1 ] [ j ] r = np . maximum ( 1 0 0 . ∗ 0 . 9 ∗ ∗ k , 1 . e −4)

f o r i i n r a n g e ( l ) :

S [ 0 ] [ i , i ] ∗= ( 1 . + r ) S [ 1 ] [ i , i ] ∗= ( 1 . + r ) r e t u r n [ S , F , Eav ]

# o p t i m i z i n g o p e r a t o r f o r c o n s t a n t s a

# r e t u r n s s c a l a r

d e f o p e r a t o r _ a ( s e l f , n , i , f l a g ) :

r e t u r n s e l f . g a u s s ( n , s e l f . C [ f l a g ] [ i ] , s e l f . B) / s e l f . p s i ( n , f l a g )

# o p t i m i z i n g o p e r a t o r f o r c o n s t a n t s c

# r e t u r n s s c a l a r

d e f o p e r a t o r _ c ( s e l f , n , i , f l a g ) :

r e t u r n 2∗ s e l f .A[ f l a g ] [ i ] ∗ np . abs ( s e l f . B) ∗ ( n− s e l f . C [ f l a g ] [ i ] )

∗ s e l f . g a u s s ( n , s e l f . C [ f l a g ] [ i ] , s e l f . B) / s e l f . p s i ( n , f l a g )

# c a l c u l a t e s t h e e x p e c t a t i o n v a l u e s e x a c t

# i n p u t i t e r a t i o n number , f u n c t i o n operator_a , f u n c t i o n o p e r a t o r _ c , p s i ,

# f l a g = 0 f o r e x a c t e x p e c t a t i o n v a l u e s

# r e t u r n s ground s t a t e e n e r g y E and m a t r i c e s S and F

d e f S_F_exact ( s e l f , k , operator_a , o p e r a t o r _ c , p s i , f l a g = 0 ) : l = s e l f . h i d d e n s i z e [ f l a g ]

O = [ np . z e r o s ( l ) , np . z e r o s ( l ) ] # <Oi>

E = 0 . # <Eloc>

OO = [ np . z e r o s ( s h a p e =( l , l ) ) , np . z e r o s ( s h a p e =( l , l ) ) ] # <OiOj>

EO = [ np . z e r o s ( l ) , np . z e r o s ( l ) ] # <ElocOi>

(26)

S = [ np . z e r o s ( s h a p e =( l , l ) ) , np . z e r o s ( s h a p e =( l , l ) ) ] # S−m a t r i x F = [ np . z e r o s ( l ) , np . z e r o s ( l ) ]

psi_sum = 0 .

f o r n i n r a n g e ( 1 , s e l f . nmax ) : p s i _ s q = p s i [ n ] ∗ ∗ 2

E_loc = s e l f . e _ l o c a l ( n ) psi_n = p s i [ n ]

i f s e l f . a != 0 : # i f l i n e a r p o t e n t i a l f o r n1 i n r a n g e ( 1 , s e l f . nmax ) :

i f n1 == n :

E_loc += 0 . 5 ∗ s e l f . a e l i f np . mod( n1 + n , 2 ) != 0 :

E_loc += −8.∗ s e l f . a ∗n∗ n1 ∗ p s i [ n1 ]

/ ( ( ( n1−n ) ∗ ∗ 2 ) ∗ ( ( n1+n ) ∗ ∗ 2 ) ∗ ( np . p i ∗ ∗ 2 ) ∗ psi_n ) E += E_loc ∗ p s i _ s q

psi_sum += p s i _ s q f o r i i n r a n g e ( l ) :

O [ 0 ] [ i ] += o p e r a t o r _ a [ n , i ] ∗ p s i _ s q O [ 1 ] [ i ] += o p e r a t o r _ c [ n , i ] ∗ p s i _ s q

EO [ 0 ] [ i ] += E_loc ∗ o p e r a t o r _ a [ n , i ] ∗ p s i _ s q EO [ 1 ] [ i ] += E_loc ∗ o p e r a t o r _ c [ n , i ] ∗ p s i _ s q

f o r j i n r a n g e ( l ) :

OO [ 0 ] [ i , j ] += o p e r a t o r _ a [ n , i ]

∗ operator_a [ n , j ] ∗ psi_sq OO [ 1 ] [ i , j ] += o p e r a t o r _ c [ n , i ]

∗ o p e r a t o r _ c [ n , j ] ∗ psi_sq O [ 0 ] = np . d i v i d e (O[ 0 ] , psi_sum )

O [ 1 ] = np . d i v i d e (O[ 1 ] , psi_sum ) EO [ 0 ] = np . d i v i d e (EO [ 0 ] , psi_sum ) EO [ 1 ] = np . d i v i d e (EO [ 1 ] , psi_sum ) OO[ 0 ] = np . d i v i d e (OO[ 0 ] , psi_sum ) OO[ 1 ] = np . d i v i d e (OO[ 1 ] , psi_sum ) E = np . d i v i d e (E , psi_sum )

f o r i i n r a n g e ( l ) :

F [ 0 ] [ i ] = EO [ 0 ] [ i ] − E ∗ O [ 0 ] [ i ] F [ 1 ] [ i ] = EO [ 1 ] [ i ] − E ∗ O [ 1 ] [ i ]

f o r j i n r a n g e ( l ) :

S [ 0 ] [ i ] [ j ] = OO [ 0 ] [ i , j ] − O [ 0 ] [ i ] ∗ O [ 0 ] [ j ] S [ 1 ] [ i ] [ j ] = OO [ 1 ] [ i , j ] − O [ 1 ] [ i ] ∗ O [ 1 ] [ j ] r = np . maximum ( 1 0 0 . ∗ 0 . 9 ∗ ∗ k , 1 . e −4)

f o r i i n r a n g e ( l ) :

S [ 0 ] [ i , i ] = S [ 0 ] [ i , i ] ∗ ( 1 . + r ) S [ 1 ] [ i , i ] = S [ 1 ] [ i , i ] ∗ ( 1 . + r ) r e t u r n [ S , F , E ]

# update p a r a m e t e r s A, C w i t h m e t r o p o l i s e x p e c t a t i o n v a l u e s

# i n p u t : number o f i t e r a t i o n s i n t h e network

(27)

# r e t u r n s s c a l a r ground s t a t e e n e r g y d e f update ( s e l f , k ) :

p s i = [ ]

o p e r a t o r _ a = np . z e r o s ( s h a p e =( s e l f . nmax , s e l f . h i d d e n s i z e [ 1 ] ) ) o p e r a t o r _ c = np . z e r o s ( s h a p e =( s e l f . nmax , s e l f . h i d d e n s i z e [ 1 ] ) ) f o r n i n r a n g e ( s e l f . nmax ) :

p s i . append ( s e l f . p s i ( n , 1 ) )

f o r i i n r a n g e ( s e l f . h i d d e n s i z e [ 1 ] ) :

o p e r a t o r _ a [ n , i ] = s e l f . o p e r a t o r _ a ( n , i , 1 ) o p e r a t o r _ c [ n , i ] = s e l f . o p e r a t o r _ c ( n , i , 1 )

[ S , F , E ] = s e l f . m e t r o p o l i s ( k , 1 0 0 0 0 , p s i , operator_a , o p e r a t o r _ c ) S_inv = [ np . l i n a l g . p i n v ( S [ 0 ] ) , np . l i n a l g . p i n v ( S [ 1 ] ) ]

# update p a r a m e t e r A

s e l f .A [ 1 ] = s e l f .A [ 1 ] − s e l f . a l p h a [ 1 ] ∗ np . dot ( S_inv [ 0 ] , F [ 0 ] )

# update p a r a m e t e r C

s e l f . C [ 1 ] = s e l f . C [ 1 ] − s e l f . a l p h a [ 1 ] ∗ np . dot ( S_inv [ 1 ] , F [ 1 ] ) r e t u r n E

# update p a r a m e t e r s A, C w i t h e x a c t e x p e c t a t i o n v a l u e s

# i n p u t : i t e r a t i o n number

# r e t u r n s s c a l a r ground s t a t e e n e r g y d e f update_exact ( s e l f , k ) :

p s i = [ ]

o p e r a t o r _ a = np . z e r o s ( s h a p e =( s e l f . nmax , s e l f . h i d d e n s i z e [ 0 ] ) ) o p e r a t o r _ c = np . z e r o s ( s h a p e =( s e l f . nmax , s e l f . h i d d e n s i z e [ 0 ] ) ) f o r n i n r a n g e ( s e l f . nmax ) :

p s i . append ( s e l f . p s i ( n , 0 ) )

f o r i i n r a n g e ( s e l f . h i d d e n s i z e [ 0 ] ) :

o p e r a t o r _ a [ n , i ] = s e l f . o p e r a t o r _ a ( n , i , 0 ) o p e r a t o r _ c [ n , i ] = s e l f . o p e r a t o r _ c ( n , i , 0 )

[ S , F , E ] = s e l f . S_F_exact ( k , operator_a , o p e r a t o r _ c , p s i ) S_inv = [ np . l i n a l g . p i n v ( S [ 0 ] ) , np . l i n a l g . p i n v ( S [ 1 ] ) ]

s e l f .A [ 0 ] = s e l f .A [ 0 ] − s e l f . a l p h a [ 0 ]

∗ np . dot ( np . l i n a l g . pi n v ( S [ 0 ] ) , F [ 0 ] ) s e l f . C [ 0 ] = s e l f . C [ 0 ] − s e l f . a l p h a [ 0 ]

∗ np . dot ( np . l i n a l g . pi n v ( S [ 1 ] ) , F [ 1 ] ) r e t u r n E

d e f main ( ) :

NN = n e u r a l _ n e t w o r k ( ) E1 = [ ]

E2 = [ ] E = [ ] K = [ ]

# e = 4 . 9 3 4 8 0 2 2 # a=0

# e = 5 . 9 2 6 0 3 # a=2

(28)

# e = 6 . 8 9 9 7 4 # a=4

# e = 8 . 7 9 5 0 8 # a=8 e = 0 . 7 9 5 0 7 8 # a=−8

p r i n t ( ’ I t e r a t i o n nbr , Exact Energy , M e t r o p o l i s Energy ’ ) f o r k i n r a n g e ( 1 0 0 ) :

e1 = NN. update_exact ( k ) e2 = NN. update ( k )

K. append ( k ) E1 . append ( e1 ) E2 . append ( e2 ) E . append ( e )

p r i n t ( k , e1 , e2 )

f i g 1 = p l t . f i g u r e ( d p i =100) p l t . s c a t t e r (K, E1 )

p l t . p l o t (K, E , c =’k ’ , l i n e w i d t h = 3 . 0 )

p l t . t i t l e ( ’ Exact Energy ’ + ’ , a l p h a = ’ + s t r (NN. g e t a l p h a ( ) [ 0 ] ) + ’ , h i d d e n n e u r o n s = ’ + s t r (NN. g e t h i d d e n s i z e ( ) [ 0 ] ) )

p l t . x l a b e l ( ’ I t e r a t i o n number ’ ) p l t . y l a b e l ( ’ Energy ’ )

p l t . show ( )

f i g 2 = p l t . f i g u r e ( d p i =100) p l t . s c a t t e r (K, E2 )

p l t . p l o t (K, E , l i n e w i d t h = 3 . 0 , c = ’ k ’ )

p l t . t i t l e ( ’ M e t r o p o l i s Energy ’ + ’ , a l p h a = ’ + s t r (NN. g e t a l p h a ( ) [ 1 ] ) + ’ , h i d d e n n e u r o n s = ’ + s t r (NN. g e t h i d d e n s i z e ( ) [ 1 ] ) )

p l t . x l a b e l ( ’ I t e r a t i o n number ’ ) p l t . y l a b e l ( ’ Energy ’ )

p l t . show ( )

main ( )

(29)

Bibliography

[1] Mohammad H Amin, Evgeny Andriyash, Jason Rolfe, Bohdan Kulchytskyy, and Roger Melko. Quantum boltzmann machine. Physical Review X, 8(2):021050, 2018.

[2] Giuseppe Carleo and Matthias Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017.

[3] Giuseppe Carleo and Matthias Troyer. Supplementary materials for solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017.

[4] Juan Carrasquilla and Roger G Melko. Machine learning phases of matter. Nature Physics, 13(5):431, 2017.

[5] David J Griffiths and Darrell F Schroeter. Introduction to quantum mechanics.

Cambridge University Press, 2018.

[6] Wolfgang Christian Harvey Gould, Jan Tobochnik. An Introduction to Computer Simulation Methods Applications to Physical System, pages 429–430. Addison- Wesley, third edition, 2006.

[7] Hinkelmann. Neural networks. University Lecture. Retrieved 2019-06-02.

URL: http://didattica.cs.unicam.it/lib/exe/fetch.php?media=didattica:

magistrale:kebi:ay_1718:ke-11_neural_networks.pdf.

[8] Roger Penrose. On best approximate solutions of linear matrix equations. In Mathe- matical Proceedings of the Cambridge Philosophical Society, volume 52, pages 17–19.

Cambridge University Press, 1956.

[9] Peiyuan Teng. Machine-learning quantum mechanics: Solving quantum mechanics problems using radial basis function networks. Physical Review E, 98(3):033305, 2018.

[10] Marcel van Gerven and Sander Bohte. Artificial neural networks as models of neural information processing. Frontiers Media SA, 2018.

[11] Hafidz Zulkifli. Understanding learning rates and how it improves performance in

deep learning, 2018.

(30)

www.kth.se

References

Related documents

The effects of the students ’ working memory capacity, language comprehension, reading comprehension, school grade and gender and the intervention were analyzed as a

Det som skulle kunna vara ett problem, nämligen att ett stort antal studenter känner att de har direktkontakt med sin lärare och kan ställa frågor i en mycket högre omfattning än

This mini-course will review current methods in the empirical study of auction data, with a focus on applications to questions of coordinated bidding behavior (bidding rings,

The children in both activity parameter groups experienced the interaction with Romo in many different ways but four additional categories were only detected in the co-creation

An overlap- ping set of tetraploid wheat accessions is analysed using a panel of SNP markers in order to investigate whether a 15-fold increase in number of markers, although markers

• Methods for industrial robot drive train design in the detail design phase, in- cluding trade-off analysis of cost, performance, and expected lifetime.. • Methods for

The layers often contain several neurons, in figure 1 an example Artificial Neural Network (ANN) with three neurons in the input layer, two hidden layers with four neurons each,

This thesis will investigate if observing the change in community structure when the network is randomized by applying random reference models can be used to determine an optimal