IN
DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS
STOCKHOLM SWEDEN 2019,
Machine learning and quantum mechanics
NATALIA ERMAKOVA ALICIA BRÅTNER
KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES
IN
DEGREE PROJECT TEKNIK, FIRST CYCLE, 15 CREDITS STOCKHOLM SWEDEN 2019,
Maskininlärning och kvantmekanik
NATALIA ERMAKOVA ALICIA BRÅTNER
KTH ROYAL INSTITUTE OF TECHNOLOGY SKOLAN FÖR TEKNIKVETENSKAP
Abstract
This thesis aims to use machine learning to solve for the ground state energy of the quantum system corresponding to the particle in a box. A radial basis function (RBF) network is used with Gaussian functions as the variational wave function. The weights in the network are updated so that the energy expectation value is minimized, which is carried out by using the variational Monte Carlo (VMC) method. The method using machine learning succeeds in finding the ground state energy for the particle in a box.
The method also works when a perturbation in the form of a linear potential is added to
the infinite potential well.
Sammanfattning
Syftet med rapporten är att redovisa för hur maskininlärning kan användas för att ap- proximera grundtillståndsenergin av kvantsystemet som motsvaras av partikeln i en låda.
Ett nätverk med radialbasfunktioner har använts med nätverket som en representation
av variationsvågfunktionen. Vikterna i nätverket har uppdaterats så att väntevärdet av
energin minimeras. Energiminimeringen har utförts med hjälp av variations-Monte Carlo-
metoden. Lösningsmetoden som presenteras har gett en bra approximation för grundtill-
ståndsenergin av partikeln i en låda. Metoden fungerar också när en störning i form av
en linjär potential är tillagd till systemet.
Contents
1 Introduction 2
2 Background Material 3
2.1 Artificial neural network . . . . 3
2.2 Particle in a box . . . . 4
2.3 Quantum-mechanical expectation values . . . . 5
2.4 Metropolis algorithm . . . . 6
2.5 Pseudo inverse . . . . 6
2.6 Regularization . . . . 6
2.7 Variational Monte Carlo method . . . . 7
3 Investigation 9 3.1 Model . . . . 9
3.2 Results . . . . 11
3.2.1 The effect of the learning rate . . . . 12
3.2.2 The effect of the number of hidden layer neurons . . . . 13
3.2.3 Perturbation with linear potential . . . . 14
3.3 Discussion . . . . 15
4 Summary and Conclusions 16
5 Acknowledgements 17
Appendices 18
Chapter 1 Introduction
Machine learning is a rapidly developing field in research and industry. Recently it has also been applied to study physical phenomena [4][1]. Applying machine learning methods within quantum mechanics is therefore an interesting and timely problem.
Machine learning and artificial neural networks in particular can be used to solve problems within quantum mechanics. This is shown in research papers, some of which are used as a reference in this thesis. In Teng’s paper it is demonstrated how a RBF network can be used to find the ground state energy of a particle in a box and a harmonic oscillator [9]. Teng shows that the ground state energy of a particle can be found by using an artificial neural network and a variational method to approximate the ground state energy. In his paper Teng demonstrates that parameters of the variational wave function can be used as a parameters of the neural network and thus adjusted in the process of training the network. The method described in Teng’s paper is used in this thesis.
Another example is Carleo and Troyer’s research article which describes the advan- tages of using machine learning when solving quantum many-body problems [2]. Carleo and Troyer discuss the difficulties in finding the ground state energy for a many-body system with interacting fermions and the possibility to reduce the exponential complexity by using machine learning.
This thesis focuses on using artifical neural networks (ANN) as a tool to solve simple
problems within quantum mechanics such as finding the ground-state energy of a particle
in an infinite square well. The possibility of obtaining a correct solution is investigated
as well as the precision of the solution compared to other methods of solving the same
problem. The effect of the parameters of the network is also examined.
Chapter 2
Background Material
2.1 Artificial neural network
The main idea behind an artificial neural network is to mimic a biological brain, that is, being able to recognize patterns and learn by training [10]. The name artificial neural network (ANN) acts as a collective name for a variety of algorithms that do this. Common to most algorithms is that they consist of neurons, which carries a numerical value, and which are interconnected via synaptic couplings. The strength of the couplings are modeled by weight functions and the learning process corresponds to adjusting the weights. As in a biological brain, different neurons will be activated for different stimuli or inputs, where the degree of activation is determined by the numerical value of the weight function [7].
The numerical value in each neuron is calculated according to an activation function, and depends on the values of the previous neurons and the weights between them. A neural network often has at least three layers of neurons, an input layer, a hidden layer and an output layer as seen in Fig. 2.1. In a feed forward network, the calculations are carried out in an order starting from the input layer and ending in the output layer. In Fig. 2.1, n represents the input to the network and c
ithe weights between the input layer and the hidden layer. The activation functions are denoted ρ
iwhich determines the value of the hidden neurons. Further, a
irepresent the weights between the hidden layer and output layer, and the sum represents the output value as a weighted sum of the previous weights and neurons, so that
output =
n
X
i
a
iρ
i. (2.1)
Equation 2.1 is valid for radial basis function (RBF) networks. A RBF network uses
activation functions that only depends on the distance from a certain point, for example
Gaussian functions. Other types of neural networks can generally look in many other
ways with different relations between the neurons. For example, it is common to use
multiple input and output neurons or several hidden layers. The activation function
also varies between different networks. In order for the neural network to learn it is
also equipped with a learning rule which aims to change the network’s input and output
behavior. More concretely, the learning rule adjusts the weights of the network. Each
time the network is fed with information the learning rule will be applied so that the
weights change to the next run. The network can then be iteratively trained until the desired behavior is achieved [7].
The learning rate is a hyperparameter, i.e. a parameter that is set to a certain value before the training. The learning rate determines how much the parameters of the network are adjusted in each iteration [11]. It is important to choose an optimal learning rate because too small value can lead to a slow performance of the network while too large value can cause instability in the convergence and uncertain results.
Figure 2.1: Illustration of a neural network.
2.2 Particle in a box
The one-dimensional model of the "particle in a box" describes a particle within a well defined by infinite potential walls on each side. It is used to illustrate quantum mechanical phenomena. The potential inside the well is zero, so the particle is free to move. A mathematical model of the potential for a well of length L is given by
V (x) =
( 0, if 0< x < L,
∞, otherwise. (2.2)
Since the walls are impenetrable the wave function is ψ = 0 outside the well. By solving the time-independent Schrödinger equation,
− ~
22m
∂
2ψ(x)
∂x
2+ V (x)ψ(x) = Eψ(x) (2.3)
and using continuity at both ends of the well, ψ(0) = ψ(L) = 0, and normalizing ψ the solutions read
ψ
n(x) = r 2
L sin nπ L x
, (2.4)
with eigenvalues
E
n= n
2π
2~
22mL
2, (2.5)
where m is the mass of the particle. In natural units and with length L = 1 and m = 1 Eq. 2.6 simplifies to
E
n= n
2π
22 . (2.6)
The ground state energy is then given by E
1= π
2/2, with the 10 first decimals given by E
1= 4.9348022005.
This thesis also considers a particle in a box with a perturbation in the form of a linear potential. With the potential, the Hamiltonian reads
H = p ˆ
22 + V (x) + aˆ x = ˆ H
0+ aˆ x (2.7) where a is a force constant of the perturbation. The first order perturbation theory is going to be used. Thus, the energy with perturbation will take the form:
E
n= n
2π
22 + hn| ax |ni = n
2π
22 + aL
2 (2.8)
2.3 Quantum-mechanical expectation values
Generally, the expectation value of an observable A is given by:
hAi = R
∞−∞
Ψ
∗AΨdx R
∞−∞
Ψ
∗Ψdx = hΨ|AΨi
hΨ|Ψi (2.9)
where Ψ is a wave function that is not normalized [5].
In this thesis the energy of a particle in a box with infinite potential walls is studied.
Time-independent Schrödinger equation can be written as:
Hψ = Eψ ˆ (2.10)
Therefore, values of E can be seen as eigenvalues of ˆ H and ψ as eigenfunctions of ˆ H.
As seen in Eq. 2.6, the Hamiltonian of the particle in a box with infinite potential walls has discrete eigenvalues. Therefore it can be of interest to look at expectation values of operators with a discrete spectrum, where the expectation value for an operator A in a general state
|ψi = X
n
c
n|f
ni = X
n
hf
n|ψi |f
ni (2.11)
with normalized eigenstates |f
ni and eigenvalues a
nis given by hAi = X
n
a
n| hf
n|ψi |
2(2.12)
where P
n= | hf
n|ψi |
2is the probability to find the system in state f
nwhen measuring
A [5].
2.4 Metropolis algorithm
The Metropolis algorithm is a Markov chain Monte Carlo method which can be used to compute expressions of the form:
hf i = R f (x)p(x)dx
R p(x)dx . (2.13)
The algorithm involves constructing a random walk i.e. a sequence of randomly chosen points (steps) {x
i} such that their asymptotic probability distribution approaches p(x) after many steps. These points construct a random walk that is defined by a transition probability T (x → x
0) which is the probability of the random walk taking a step from x to x
0. The common choice of transition probability is given by [6]:
T (x → x
0) = min1, p(x
0)
p(x) . (2.14)
After a random walk of N steps is generated, the expectation value is evaluated by:
hf i = 1 N
N
X
i0
f (x
i). (2.15)
2.5 Pseudo inverse
The pseudoinverse or the Moore-Penrose matrix is a generalization of the inverse of a matrix. The pseudoinverse A
+of a m × n-matrix A with entries in R or C must satisfy the following conditions.
1. AA
+A = A 2. A
+AA
+= A
+3. (AA
+)
∗= AA
+(* denotes conjugate transpose) 4. (A
+A)
∗= A
+A
The conditions guarantee existence and uniqueness of A
+. It follows that if A is invertible, the pseudoinverse equals the inverse, A
+= A
−1. The Moore-Penrose matrix also allows an inverse for singular matrices. The numpy library contains a built-in pseudoinverse function which is calculated by using the singular value decomposition [8].
2.6 Regularization
As will be shown in Section 2.7, an inverse of the covariance matrix will be used to adjust the parameters of the network. The inverse of the covariance does not always exists and therefore the Moore-Penrose pseudoinverse will be used. However, even with the use of a pseudoinverse small values in the matrix diagonal can lead to unstable results in the energy values. Therefore a matrix regularization will be used, i.e. an additional term will be added to the matrix diagonal. Thus the elements in the matrix diagonal will be given by
S
ii0= S
ii+ r(k)S
ii(2.16)
Here S denotes the covariance matrix and r(k) is a parameter which is a function of k,
the iteration number. [3]
2.7 Variational Monte Carlo method
The variational Monte Carlo (VMC) method is a way to find an approximation of the ground state energy of a system. In this method an ansatz for the wave function is used.
It is assumed that the wave function is equal to some function
ψ(λ), where λ is a set of variational parameters. If ˆ H is the Hamiltonian operator the energy expectation value can be written as:
E(λ) = ψ(λ) ˆ H
ψ(λ) ψ(λ)
ψ(λ) (2.17)
This value can be approximated by using the Metropolis algorithm [9].
After the energy expectation value has been evaluated it is minimized by changing the parameters in the variational wave function. In this thesis the stochastic reconfiguration method will be used to adjust the parameters [3].
To use the stochastic reconfiguration method the variational derivatives need to be introduced:
O
i(n) = ∂
λiψ
λ(n)
ψ
λ(n) . (2.18)
Here, n represent some configuration of the system and O
i(n) is the derivative of the wave function ansatz with respect to the ith term, divided by the same ansatz.
To update the parameters the covariance matrix S and the forces F need to be introduced. They are defined by:
S
ij= hO
∗iO
ji − hO
∗iihO
ji (2.19) F
i= hE
localO
i∗i − hE
localihO
∗ii (2.20) The elements in the S-matrix represents the covariance between O
iand O
j, that is, the linear relationship between them. F kan be interpreted as a force that pulls the param- eters towards the energy minimum. In these equations h. . . i denotes the expectation values for these derivatives which is also evaluated using the Metropolis algorithm [9].
The local energy is introduced and defined by E
local= hn| H
ψ(λ) n
ψ(λ) = P
n0
hn| H |n
0i n
0ψ(λ) n
ψ(λ) (2.21)
where the matrix elements for the particle in the box are [9]
hn| ax n
0=
a
(n−n4[(−1)0)n+n02(n+n−1]nn0)2π20, if n 6= n
00.5a, if n = n
0(2.22)
A regularization is used for the diagonal elements of the matrix S
S
ii0= S
ii+ r(k)S
ii(2.23)
Here, k is the number of the current iteration and r(k) is given by
r(k) = max(100 · 0.9
k, 10
−4) (2.24)
The parameters is then updated according to the stochastic reconfiguration method
λ
0j= λ
j− αS
ij−1F
i(2.25)
Here, α represent the learning rate of the network and S
ij−1is the Moore-Penrose pseudoinverse since the covariance matrix is not always invertible.
As a result the values of each parameter of the variational wave function will be
updated, with λ
0jas the element with index j of the new parameter and λ
jas the old. The
vector F can be seen as a force that pulls the parameters towards the energy minimum,
and becomes 0 when the minimum has been found.
Chapter 3 Investigation
3.1 Model
This thesis concentrates on neural networks consisting of three layers of neurons, an input layer, a hidden layer and an output layer seen in Fig. 2.1. The neural network will also consist of one input neuron, one output neuron, but different numbers of hidden neurons will be examined. From a given input n in the input layer, the values of the other neurons are calculated as follows. The value for each neuron in the hidden layer is calculated using an activation function, ρ
iwhich depends on the variable n and the parameter c
i.
As stated before, the output of the neural network is given by
output =
M
X
i
a
iρ
i(3.1)
where M is the number of neurons in the hidden layer. In this thesis Gaussian functions are used as activation functions, therefore
ρ
i(kx − c
ik) = e
−|bi||x−ci|2(3.2) The ansatz of the amplitude function is set to
ψ(n
1, n
2, . . . , n
p, a, b, c) =
M
X
i
a
ie
−|bi||n−ci|2(3.3)
Thus, a
i, b
i, c
iare the the ANN parameters, which from now on will be denoted λ.
They can also be seen as the variational wave parameters in the VMC method. Thus, the parameters a
i, b
i, c
iin a variational wave function become parameters of the neural network which are associated with weights between the neurons of the network.
Since the one-dimensional particle in a box is considered, n and c
ibecome scalars.
Furthermore, n is restricted by 1 ≤ n ≤ n
max= 20 in order to avoid an infinite sum.
Now the wave function can be expressed as a linear combination of the energy eigenbasis
|ni:
|ψi =
20
X
n=1
ψ(n) |ni (3.4)
where ψ(n) is the amplitude function in Eq. (3.3). Therefore, the RBF-network can be seen as a representation of the amplitude function.
The amplitude function can now be used to estimate the energy value. As previously mentioned:
E(λ) = ψ(λ) ˆ H
ψ(λ) ψ(λ)
ψ(λ) = P
n,n0
hψ|ni hn| ˆ H |n
0i hn
0|ψi
hn|ψihn|ψiP
n
hψ|ni hn|ψi ) =
= P
n
ψ(n, λ)
2
E
local(n, λ) P
n
ψ(n, λ)
2
=
R ψ(n, λ)
2
E
local(n, λ)dn R ψ(n, λ)
2
dn
= hEi
(3.5)
By comparing with Eq. 2.15 it is easy to see that the energy expectation value can be evaluated using the Metropolis algorithm. When using the Metropolis algorithm in this case, a random move from n to n
00= n ± 1 is performed. The number of samples is set at 10000. The move is either accepted or rejected with a transition probability T (n → n
00) given by:
T (n → n
00) = min
1,
n
00ψ(λ)
2
n
ψ(λ)
2
(3.6)
After the move has been determined, the values of the amplitude function ψ are calculated for this quantum number using Eq. (3.3). The local energy can be decided using the formula:
E
local(n, λ) = hn| ˆ H |ψi hn|ψi =
P
n0
hn| ˆ H |n
0i hn
0|ψi
hn|ψi . (3.7)
The Hamiltonian in the unperturbed system is ˆ H
0=
pˆ22+ V (x). Since |ni are the energy eigenbasis with eigenvalues E
nand it follows that hn|n
0i = δ
nn0, E
localcan be simplified as follows:
E
local(n, λ) = P
n0
hn| ˆ H
0|n
0i hn
0|ψi
hn|ψi =
P
n0
E
n0hn|n
0i hn
0|ψi
hn|ψi = E
nhn|ψi
hn|ψi = E
n= n
2π
22
(3.8)
A different result is obtained for the perturbed system with Hamiltonian ˆ H = ˆ H
0+aˆ x.
With again the unperturbed energy eigenbasis:
E
local(n, λ) = P
n0
hn| ˆ H
0+ aˆ x |n
0i hn
0|ψi
hn|ψi = E
n+
P
n0
hn| aˆ x |n
0i hn
0|ψi
hn|ψi (3.9)
where hn| aˆ x |n
0i is calculated according to Eq. (2.10).
The stochastic reconfiguration operators for each parameter can also be calculated using the ψ obtained in this Metropolis step and the current parameters of the network using Eq. 2.19:
O
ai(n) = ρ
iψ (3.10)
O
bi(n) = − a
ib
i|n − c
i|
2|b
i| ψ (3.11)
O
ci(n) = 2a
i|b
i| (n − c
i)ρ
iψ (3.12)
By using the above equations the expectation values hO
i∗O
ji, hO
∗ii, hO
ji, hE
localO
∗ii are calculated, which are then used to calculate the covariance matrix and the force vector according to Eqs. (2.20) and (2.21.) The expectation values can be calculated either using the Metropolis method or the exact method. In this thesis mainly calculations of expectation values are presented using the Metropolis method. However, by following the same procedure as above but by exchanging the Metropolis calculations of the expectation values by the exact calculations, the correctness can be controlled. The exact expectation values are calculated according to Eq. (2.14):
hO
ii = P
20n=1
O
i(n)| hn|ψi |
2P
20n=1
| hn|ψi |
2, (3.13)
where n is truncated at n
max= 20 and | hn|ψi |
2= |ψ(n)|
2is the (unnormalized) proba- bility to find the particle in state |ni. The sum in the denominator corresponds to the normalizing constant.
Given that in this case the operators are always real, it follows that O
i∗= O
ifor all i.
The values of O
i, and O
iO
jare calculated for each parameter of the network and stored in a vector and a matrix respectively.
As the last step for an iteration on the network, each parameter is updated according to Eq. (2.25).
It is expected that by performing the operations described above for a sufficient amount of iterations the energy will be minimized and a correct result will be obtained.
3.2 Results
The results obtained with random initial values of network parameters, M hidden layer
neurons and learning rate given by α = 0.05 were satisfactory and gave an answer that
was close enough to the analytical value of the ground state energy. The results of some
simulations are presented in the figures below. It should also be mentioned that large
energy values were cut off and set to a constant value, in order to see details in the
convergence. The graphs presented are a selection of many other graphs with different
start parameters. The energy on the vertical axis is presented in natural units.
(a) Energy convergence obtained with Metropolis method.
(b) Energy convergence obtained with the exact calculations.
Figure 3.1: The convergence of the ground state energy
As can be seen in Fig. 3.1 above the value of the ground state energy converges to the exact value of E
1= 4.9348022005. It is worth noting is that a spike appears in the top graph.
Some problems with the covariance matrix S were encountered. For some values of the initial parameters numpy failed to obtain the pseudoinverse of the covariance matrix and the program crashed. This behaviour will be discussed in Section 3.3.
3.2.1 The effect of the learning rate
The convergence for 4 different learning rates α can be seen in Fig. 3.2. As previously
mentioned, large energy values are cut off at E = 15 for the sake of convenience. It can
be seen that small values of α generally result in slower convergence. Large α instead
provides faster but more unstable convergence.
Figure 3.2: Ground state energy convergence for different learning rates α.
3.2.2 The effect of the number of hidden layer neurons
The convergence with M = 2, 4, 6, 10 number of hidden neurons can be seen in Fig. 3.3.
The energy converges for all different numbers of neurons, but is unstable for M = 10 neurons. It should also be noted that the graph in Fig. 3.3 was only one of many test runs with different start parameters. After many runs the number of M = 6 neurons was considered to provide a fast and stable convergence.
Figure 3.3: Ground state energy convergence for different numbers of hidden neurons M .
3.2.3 Perturbation with linear potential
Finally, the convergence of the linear potential was examined. The result is shown in Fig.
3.4, with a = 0, 2, 4, 8, −8. Here α = 0.05 and M = 6 neurons are used. The expectation values are calculated using the Metropolis method in the graph.
Figure 3.4: Energy convergence with linear perturbations a.
The energy values obtained analytically are taken from Teng’s paper [9] and presented in the table below. Only the exact expectation values are presented in the graph since the Metropolis method failed to generate a convergence with the same decimal sequence.
Even with the exact expectation values, it was difficult to get the same decimal sequence, but after much fine tuning of parameters (M and α) and many runs of the network, the values that appeared several times were used and can be seen in Tab. 3.1. The energy values are not rounded but cut after the fifth decimal. The energy value of the linear potential a=-8 is not included in the table because it converged to different energy values each time.
a Exact value VMC Exact
0.0 4.93480 4.93480
2.0 5.92603 5.92603
4.0 6.89974 6.89979
8.0 8.79508 8.79552
Table 3.1: The analytical values compared to the VMC-values of the ground state energy of the particle in the box with perturbation.
As can be seen in Fig. 3.4 and Tab. 3.1 the energy converges to values that correspond
to the exact values.
3.3 Discussion
Overall the constructed ANN provided results that were close to the ground state energy value obtained analytically. The behaviour of the model with respect to different linear potentials was also reasonable.
The problem with calculating the pseudoinverse of the covariance matrix was possibly due to some choices of initial parameters. For some choices the attempt to minimize the energy led to the wave function being equal to zero. This led to numpy evaluating the stochastic reconfiguration operators as "Not a number" since the stochastic reconfigura- tion operators was obtained by dividing with the wave function (Eqs. (3.10) - (3.12)).
This leads to the conclusion that the initial parameters should be chosen carefully in order to obtain the best results.
However, it can be stated that, regardless of the choice of starting parameters, the convergence still looks irregular with spikes. This can be seen both in the exact and the Metropolis convergence, though no figure of the exact energy convergence with spikes is included in this thesis. The Metropolis method, however, is much more uneven than the exact method, which is probably due to the stochastic nature of the Metropolis algorithm.
This leads to the suspicion that the problem has to do with the stochastic reconfiguration method Eqs. (2.19) - (2.25), which thus creates instability in some unresolved way.
As stated above, it was noted that with increasing learning rate the convergence was faster but less stable than for smaller values of α. By definition, the learning rate deter- mines how much the parameters are changed in each iteration. Therefore it should be expected that a larger α leads to faster convergence as then the network learns faster.
However this can also result in instability as then for example the spikes from the ran- domness of the Metropolis algorithm will affect the system to even greater extent. For the same reason smaller values of α provided a more stable curve but with slower convergence.
The effect of the number of hidden neurons can also be explained. As was mentioned in Sec. 2.1 the output of the ANN is given by output = P
Mi
a
iρ
i, where M denotes the number of hidden layer neurons. As can be seen in this equation, the final output can be perceived as the summation of all the hidden-layer neurons. This can explain the slower convergence for smaller M : if the number of neurons is too small, then the output obtained in each iteration is based on too little amount of data for the network to learn effectively. For the same reason the performance improves for larger M . However, simi- larly to the case with larger learning rate, larger value of M leads to unstable behaviour of the model. This can be explained by the fact that more summands, i.e. more hidden neurons in the hidden layer, give greater value of ψ and hence E, which gives higher spikes.
As for the results with the perturbation, it can be stated that the values of the ground
state energy obtained with ANN correspond to the analytical values in perturbation
theory. This is a good sign because then it can be stated that the ANN gives a correct
result for different types of problems, and not only for the unperturbed particle in a box.
Chapter 4
Summary and Conclusions
The attempt to evaluate the ground state energy by minimizing the energy expectation value using the neural network gave satisfactory results close to the analytical values.
Such results were obtained even when a perturbation in the form of a linear potential was added to the model.
The results show that the method in the thesis is successful in solving problems within quantum mechanics and useful for at least simple problems with one particle. Perhaps, ANN can potentially be used in a similar manner to effectively solve problems with many particles.
However, it should be mentioned that the overall performance, e.g. stability and rate
of convergence of the neural network was greatly dependent on the initial parameters, the
learning rate and the amount of hidden layer neurons. Therefore, these variables should
be adjusted according to the current problem. For problems requiring a large number of
hidden neurons, this can be a problem as the time for each iteration in the network will
increase drastically.
Chapter 5
Acknowledgements
We acknowledge Prof. M. Wallin for his always equally helpful answers to all our ques-
tions and encouragement during the work.
Appendices
# −∗− c o d i n g : u t f −8 −∗−
# Using n e u r a l n e t w o r k s t o s o l v e f o r t h e ground s t a t e e n e r g y
# o f t h e one−d i m e n s i o n a l p a r t i c l e −i n a box
# A l i c i a B r t n e r and N a t a l i a Ermakova 2019−05−18
# S u p e r v i s e d by P r o f . Mats W a l l i n
# R e f e r e n c e : Peiyuan Teng , PRE 9 8 , 033305 ( 2 0 1 8 ) i m p o r t numpy a s np
from numpy . l i n a l g i m p o r t i n v i m p o r t m a t p l o t l i b
from m a t p l o t l i b i m p o r t p y p l o t a s p l t i m p o r t random
m a t p l o t l i b . r c ( ’ x t i c k ’ , l a b e l s i z e =20) m a t p l o t l i b . r c ( ’ y t i c k ’ , l a b e l s i z e =20) f o n t = { ’ f a m i l y ’ : ’ normal ’ ,
’ weight ’ : ’ normal ’ ,
’ s i z e ’ : 20}
m a t p l o t l i b . r c ( ’ f o n t ’ , ∗∗ f o n t )
# 2 n e u r a l n e t w o r k s o f d i f f e r e n t s i z e s and l e a r n i n g r a t e s c l a s s n e u r a l _ n e t w o r k ( o b j e c t ) :
d e f __init__ ( s e l f ) :
s e l f . h i d d e n s i z e = [ 6 , 6 ] # h i d d e n n e u r o n s [ e x a c t , m e t r o p o l i s ] s e l f . nmax = 20 # t r u n c a t e d e i g e n s t a t e s nmax = 20
s e l f . a l p h a = [ 0 . 0 5 , 0 . 0 5 ] # a l p h a l e a r n i n g r a t e [ e x a c t , m e t r o p o l i s ] s e l f . a = −8 # l i n e a r p o t e n t i a l
# w e i g h t v e c t o r s ( c i ) from i n p u t l a y e r t o h i d d e n l a y e r s e l f . C = [ np . random . random ( s e l f . h i d d e n s i z e [ 0 ] ) ,
np . random . random ( s e l f . h i d d e n s i z e [ 1 ] ) ]
# w e i g h t v e c t o r s ( a i ) from h i d d e n t o o u t p u t l a y e r s e l f .A = [ np . random . random ( s e l f . h i d d e n s i z e [ 0 ] ) , np . random . random ( s e l f . h i d d e n s i z e [ 1 ] ) ]
s e l f . B = 1 # b = 1 d e f g e t h i d d e n s i z e ( s e l f ) :
r e t u r n s e l f . h i d d e n s i z e d e f g e t a l p h a ( s e l f ) :
r e t u r n s e l f . a l p h a d e f g e t a ( s e l f ) :
r e t u r n s e l f . a
# g e n e r a t e s o u t p u t from n e u r a l network
# f e e d f o r w a r d = p s i f u n c t i o n
# r e t u r n s s c a l a r v a l u e p s i
d e f p s i ( s e l f , n , f l a g ) : p s i = 0 .
f o r M i n r a n g e ( 1 , s e l f . h i d d e n s i z e [ f l a g ] ) :
p s i += s e l f .A[ f l a g ] [M] ∗ s e l f . g a u s s ( n , s e l f . C [ f l a g ] [M] , s e l f . B) r e t u r n p s i
# a c t i v a t i o n f u n c t i o n
# r e t u r n s s c a l a r
d e f g a u s s ( s e l f , n , c , b ) :
r e t u r n np . exp(−np . abs ( b ) ∗ np . abs ( n−c ) ∗ ∗ 2 )
# r e t u r n s s c a l a r e l o c a l = e n e r g y f o r e i g e n s t a t e n d e f e _ l o c a l ( s e l f , n ) :
r e t u r n ( n ∗∗2∗ np . p i ∗ ∗ 2 ) / 2
# m e t r o p o l i s f u n c t i o n
# i n p u t i t e r a t i o n number , number o f i t e r a t i o n s , operator_a ,
# o p e r a t o r _ c , p s i , f l a g = 1 f o r m e t r o p o l i s e x p e c t a t i o n v a l u e s
# c a l c u l a t e s e x p e c t a t i o n v a l u e s and r e t u r n s ground s t a t e e n e r g y
# and m a t r i c e s S and F
d e f m e t r o p o l i s ( s e l f , k , i t e r , p s i , operator_a , o p e r a t o r _ c , f l a g = 1 ) : l = s e l f . h i d d e n s i z e [ f l a g ]
O = [ np . z e r o s ( l ) , np . z e r o s ( l )]# <Oi>
Eav = 0 . # <Eloc>
OO = [ np . z e r o s ( s h a p e =( l , l ) ) , np . z e r o s ( s h a p e =( l , l ) ) ] # <OiOj>
EO = [ np . z e r o s ( l ) , np . z e r o s ( l ) ] # <ElocOi>
S = [ np . z e r o s ( s h a p e =( l , l ) ) , np . z e r o s ( s h a p e =( l , l ) ) ] # S−m a t r i x F = [ np . z e r o s ( l ) , np . z e r o s ( l ) ]
n i = 1 # i n i t i a l v a l u e n f o r i t e r a i n r a n g e ( i t e r ) :
n j = n i − 1
i f random . random ( ) > 0 . 5 : n j = n i + 1
i f n j > 0 and n j < s e l f . nmax :
i f ( p s i [ n j ] / p s i [ n i ] ) ∗ ∗ 2 > random . random ( ) : n i = n j
E = s e l f . e _ l o c a l ( n i ) # E l o c a l p s i _ n i = p s i [ n i ]
i f s e l f . a != 0 : # i f l i n e a r p o t e n t i a l f o r n1 i n r a n g e ( 1 , s e l f . nmax ) :
i f n1 == n i :
E += 0 . 5 ∗ s e l f . a
e l i f np . mod( n1 + n i , 2 ) != 0 : E += −8.∗ s e l f . a ∗ n i ∗ n1 ∗ p s i [ n1 ]
/ ( ( ( n1−n i ) ∗ ∗ 2 ) ∗ ( ( n1+n i ) ∗ ∗ 2 ) ∗ ( np . p i ∗ ∗ 2 ) ∗ p s i _ n i ) Eav += E
f o r i i n r a n g e ( l ) :
O [ 0 ] [ i ] += o p e r a t o r _ a [ n i , i ] O [ 1 ] [ i ] += o p e r a t o r _ c [ n i , i ]
EO [ 0 ] [ i ] += E ∗ o p e r a t o r _ a [ n i , i ] EO [ 1 ] [ i ] += E ∗ o p e r a t o r _ c [ n i , i ]
f o r j i n r a n g e ( l ) :
OO [ 0 ] [ i , j ] += o p e r a t o r _ a [ n i , i ] ∗ o p e r a t o r _ a [ n i , j ] OO [ 1 ] [ i , j ] += o p e r a t o r _ c [ n i , i ] ∗ o p e r a t o r _ c [ n i , j ] Eav = Eav/ i t e r
O [ 0 ] = np . d i v i d e (O[ 0 ] , i t e r ) O [ 1 ] = np . d i v i d e (O[ 1 ] , i t e r ) EO [ 0 ] = np . d i v i d e (EO [ 0 ] , i t e r ) EO [ 1 ] = np . d i v i d e (EO [ 1 ] , i t e r ) OO[ 0 ] = np . d i v i d e (OO[ 0 ] , i t e r ) OO[ 1 ] = np . d i v i d e (OO[ 1 ] , i t e r )
f o r i i n r a n g e ( l ) :
F [ 0 ] [ i ] = EO [ 0 ] [ i ] − Eav ∗ O [ 0 ] [ i ] F [ 1 ] [ i ] = EO [ 1 ] [ i ] − Eav ∗ O [ 1 ] [ i ]
f o r j i n r a n g e ( l ) :
S [ 0 ] [ i , j ] = OO [ 0 ] [ i , j ] − O [ 0 ] [ i ] ∗ O [ 0 ] [ j ] S [ 1 ] [ i , j ] = OO [ 1 ] [ i , j ] − O [ 1 ] [ i ] ∗ O [ 1 ] [ j ] r = np . maximum ( 1 0 0 . ∗ 0 . 9 ∗ ∗ k , 1 . e −4)
f o r i i n r a n g e ( l ) :
S [ 0 ] [ i , i ] ∗= ( 1 . + r ) S [ 1 ] [ i , i ] ∗= ( 1 . + r ) r e t u r n [ S , F , Eav ]
# o p t i m i z i n g o p e r a t o r f o r c o n s t a n t s a
# r e t u r n s s c a l a r
d e f o p e r a t o r _ a ( s e l f , n , i , f l a g ) :
r e t u r n s e l f . g a u s s ( n , s e l f . C [ f l a g ] [ i ] , s e l f . B) / s e l f . p s i ( n , f l a g )
# o p t i m i z i n g o p e r a t o r f o r c o n s t a n t s c
# r e t u r n s s c a l a r
d e f o p e r a t o r _ c ( s e l f , n , i , f l a g ) :
r e t u r n 2∗ s e l f .A[ f l a g ] [ i ] ∗ np . abs ( s e l f . B) ∗ ( n− s e l f . C [ f l a g ] [ i ] )
∗ s e l f . g a u s s ( n , s e l f . C [ f l a g ] [ i ] , s e l f . B) / s e l f . p s i ( n , f l a g )
# c a l c u l a t e s t h e e x p e c t a t i o n v a l u e s e x a c t
# i n p u t i t e r a t i o n number , f u n c t i o n operator_a , f u n c t i o n o p e r a t o r _ c , p s i ,
# f l a g = 0 f o r e x a c t e x p e c t a t i o n v a l u e s
# r e t u r n s ground s t a t e e n e r g y E and m a t r i c e s S and F
d e f S_F_exact ( s e l f , k , operator_a , o p e r a t o r _ c , p s i , f l a g = 0 ) : l = s e l f . h i d d e n s i z e [ f l a g ]
O = [ np . z e r o s ( l ) , np . z e r o s ( l ) ] # <Oi>
E = 0 . # <Eloc>
OO = [ np . z e r o s ( s h a p e =( l , l ) ) , np . z e r o s ( s h a p e =( l , l ) ) ] # <OiOj>
EO = [ np . z e r o s ( l ) , np . z e r o s ( l ) ] # <ElocOi>
S = [ np . z e r o s ( s h a p e =( l , l ) ) , np . z e r o s ( s h a p e =( l , l ) ) ] # S−m a t r i x F = [ np . z e r o s ( l ) , np . z e r o s ( l ) ]
psi_sum = 0 .
f o r n i n r a n g e ( 1 , s e l f . nmax ) : p s i _ s q = p s i [ n ] ∗ ∗ 2
E_loc = s e l f . e _ l o c a l ( n ) psi_n = p s i [ n ]
i f s e l f . a != 0 : # i f l i n e a r p o t e n t i a l f o r n1 i n r a n g e ( 1 , s e l f . nmax ) :
i f n1 == n :
E_loc += 0 . 5 ∗ s e l f . a e l i f np . mod( n1 + n , 2 ) != 0 :
E_loc += −8.∗ s e l f . a ∗n∗ n1 ∗ p s i [ n1 ]
/ ( ( ( n1−n ) ∗ ∗ 2 ) ∗ ( ( n1+n ) ∗ ∗ 2 ) ∗ ( np . p i ∗ ∗ 2 ) ∗ psi_n ) E += E_loc ∗ p s i _ s q
psi_sum += p s i _ s q f o r i i n r a n g e ( l ) :
O [ 0 ] [ i ] += o p e r a t o r _ a [ n , i ] ∗ p s i _ s q O [ 1 ] [ i ] += o p e r a t o r _ c [ n , i ] ∗ p s i _ s q
EO [ 0 ] [ i ] += E_loc ∗ o p e r a t o r _ a [ n , i ] ∗ p s i _ s q EO [ 1 ] [ i ] += E_loc ∗ o p e r a t o r _ c [ n , i ] ∗ p s i _ s q
f o r j i n r a n g e ( l ) :
OO [ 0 ] [ i , j ] += o p e r a t o r _ a [ n , i ]
∗ operator_a [ n , j ] ∗ psi_sq OO [ 1 ] [ i , j ] += o p e r a t o r _ c [ n , i ]
∗ o p e r a t o r _ c [ n , j ] ∗ psi_sq O [ 0 ] = np . d i v i d e (O[ 0 ] , psi_sum )
O [ 1 ] = np . d i v i d e (O[ 1 ] , psi_sum ) EO [ 0 ] = np . d i v i d e (EO [ 0 ] , psi_sum ) EO [ 1 ] = np . d i v i d e (EO [ 1 ] , psi_sum ) OO[ 0 ] = np . d i v i d e (OO[ 0 ] , psi_sum ) OO[ 1 ] = np . d i v i d e (OO[ 1 ] , psi_sum ) E = np . d i v i d e (E , psi_sum )
f o r i i n r a n g e ( l ) :
F [ 0 ] [ i ] = EO [ 0 ] [ i ] − E ∗ O [ 0 ] [ i ] F [ 1 ] [ i ] = EO [ 1 ] [ i ] − E ∗ O [ 1 ] [ i ]
f o r j i n r a n g e ( l ) :
S [ 0 ] [ i ] [ j ] = OO [ 0 ] [ i , j ] − O [ 0 ] [ i ] ∗ O [ 0 ] [ j ] S [ 1 ] [ i ] [ j ] = OO [ 1 ] [ i , j ] − O [ 1 ] [ i ] ∗ O [ 1 ] [ j ] r = np . maximum ( 1 0 0 . ∗ 0 . 9 ∗ ∗ k , 1 . e −4)
f o r i i n r a n g e ( l ) :
S [ 0 ] [ i , i ] = S [ 0 ] [ i , i ] ∗ ( 1 . + r ) S [ 1 ] [ i , i ] = S [ 1 ] [ i , i ] ∗ ( 1 . + r ) r e t u r n [ S , F , E ]
# update p a r a m e t e r s A, C w i t h m e t r o p o l i s e x p e c t a t i o n v a l u e s
# i n p u t : number o f i t e r a t i o n s i n t h e network
# r e t u r n s s c a l a r ground s t a t e e n e r g y d e f update ( s e l f , k ) :
p s i = [ ]
o p e r a t o r _ a = np . z e r o s ( s h a p e =( s e l f . nmax , s e l f . h i d d e n s i z e [ 1 ] ) ) o p e r a t o r _ c = np . z e r o s ( s h a p e =( s e l f . nmax , s e l f . h i d d e n s i z e [ 1 ] ) ) f o r n i n r a n g e ( s e l f . nmax ) :
p s i . append ( s e l f . p s i ( n , 1 ) )
f o r i i n r a n g e ( s e l f . h i d d e n s i z e [ 1 ] ) :
o p e r a t o r _ a [ n , i ] = s e l f . o p e r a t o r _ a ( n , i , 1 ) o p e r a t o r _ c [ n , i ] = s e l f . o p e r a t o r _ c ( n , i , 1 )
[ S , F , E ] = s e l f . m e t r o p o l i s ( k , 1 0 0 0 0 , p s i , operator_a , o p e r a t o r _ c ) S_inv = [ np . l i n a l g . p i n v ( S [ 0 ] ) , np . l i n a l g . p i n v ( S [ 1 ] ) ]
# update p a r a m e t e r A
s e l f .A [ 1 ] = s e l f .A [ 1 ] − s e l f . a l p h a [ 1 ] ∗ np . dot ( S_inv [ 0 ] , F [ 0 ] )
# update p a r a m e t e r C
s e l f . C [ 1 ] = s e l f . C [ 1 ] − s e l f . a l p h a [ 1 ] ∗ np . dot ( S_inv [ 1 ] , F [ 1 ] ) r e t u r n E
# update p a r a m e t e r s A, C w i t h e x a c t e x p e c t a t i o n v a l u e s
# i n p u t : i t e r a t i o n number
# r e t u r n s s c a l a r ground s t a t e e n e r g y d e f update_exact ( s e l f , k ) :
p s i = [ ]
o p e r a t o r _ a = np . z e r o s ( s h a p e =( s e l f . nmax , s e l f . h i d d e n s i z e [ 0 ] ) ) o p e r a t o r _ c = np . z e r o s ( s h a p e =( s e l f . nmax , s e l f . h i d d e n s i z e [ 0 ] ) ) f o r n i n r a n g e ( s e l f . nmax ) :
p s i . append ( s e l f . p s i ( n , 0 ) )
f o r i i n r a n g e ( s e l f . h i d d e n s i z e [ 0 ] ) :
o p e r a t o r _ a [ n , i ] = s e l f . o p e r a t o r _ a ( n , i , 0 ) o p e r a t o r _ c [ n , i ] = s e l f . o p e r a t o r _ c ( n , i , 0 )
[ S , F , E ] = s e l f . S_F_exact ( k , operator_a , o p e r a t o r _ c , p s i ) S_inv = [ np . l i n a l g . p i n v ( S [ 0 ] ) , np . l i n a l g . p i n v ( S [ 1 ] ) ]
s e l f .A [ 0 ] = s e l f .A [ 0 ] − s e l f . a l p h a [ 0 ]
∗ np . dot ( np . l i n a l g . pi n v ( S [ 0 ] ) , F [ 0 ] ) s e l f . C [ 0 ] = s e l f . C [ 0 ] − s e l f . a l p h a [ 0 ]
∗ np . dot ( np . l i n a l g . pi n v ( S [ 1 ] ) , F [ 1 ] ) r e t u r n E
d e f main ( ) :
NN = n e u r a l _ n e t w o r k ( ) E1 = [ ]
E2 = [ ] E = [ ] K = [ ]
# e = 4 . 9 3 4 8 0 2 2 # a=0
# e = 5 . 9 2 6 0 3 # a=2
# e = 6 . 8 9 9 7 4 # a=4
# e = 8 . 7 9 5 0 8 # a=8 e = 0 . 7 9 5 0 7 8 # a=−8
p r i n t ( ’ I t e r a t i o n nbr , Exact Energy , M e t r o p o l i s Energy ’ ) f o r k i n r a n g e ( 1 0 0 ) :
e1 = NN. update_exact ( k ) e2 = NN. update ( k )
K. append ( k ) E1 . append ( e1 ) E2 . append ( e2 ) E . append ( e )
p r i n t ( k , e1 , e2 )
f i g 1 = p l t . f i g u r e ( d p i =100) p l t . s c a t t e r (K, E1 )
p l t . p l o t (K, E , c =’k ’ , l i n e w i d t h = 3 . 0 )
p l t . t i t l e ( ’ Exact Energy ’ + ’ , a l p h a = ’ + s t r (NN. g e t a l p h a ( ) [ 0 ] ) + ’ , h i d d e n n e u r o n s = ’ + s t r (NN. g e t h i d d e n s i z e ( ) [ 0 ] ) )
p l t . x l a b e l ( ’ I t e r a t i o n number ’ ) p l t . y l a b e l ( ’ Energy ’ )
p l t . show ( )
f i g 2 = p l t . f i g u r e ( d p i =100) p l t . s c a t t e r (K, E2 )
p l t . p l o t (K, E , l i n e w i d t h = 3 . 0 , c = ’ k ’ )
p l t . t i t l e ( ’ M e t r o p o l i s Energy ’ + ’ , a l p h a = ’ + s t r (NN. g e t a l p h a ( ) [ 1 ] ) + ’ , h i d d e n n e u r o n s = ’ + s t r (NN. g e t h i d d e n s i z e ( ) [ 1 ] ) )
p l t . x l a b e l ( ’ I t e r a t i o n number ’ ) p l t . y l a b e l ( ’ Energy ’ )
p l t . show ( )
main ( )
Bibliography
[1] Mohammad H Amin, Evgeny Andriyash, Jason Rolfe, Bohdan Kulchytskyy, and Roger Melko. Quantum boltzmann machine. Physical Review X, 8(2):021050, 2018.
[2] Giuseppe Carleo and Matthias Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017.
[3] Giuseppe Carleo and Matthias Troyer. Supplementary materials for solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017.
[4] Juan Carrasquilla and Roger G Melko. Machine learning phases of matter. Nature Physics, 13(5):431, 2017.
[5] David J Griffiths and Darrell F Schroeter. Introduction to quantum mechanics.
Cambridge University Press, 2018.
[6] Wolfgang Christian Harvey Gould, Jan Tobochnik. An Introduction to Computer Simulation Methods Applications to Physical System, pages 429–430. Addison- Wesley, third edition, 2006.
[7] Hinkelmann. Neural networks. University Lecture. Retrieved 2019-06-02.
URL: http://didattica.cs.unicam.it/lib/exe/fetch.php?media=didattica:
magistrale:kebi:ay_1718:ke-11_neural_networks.pdf.
[8] Roger Penrose. On best approximate solutions of linear matrix equations. In Mathe- matical Proceedings of the Cambridge Philosophical Society, volume 52, pages 17–19.
Cambridge University Press, 1956.
[9] Peiyuan Teng. Machine-learning quantum mechanics: Solving quantum mechanics problems using radial basis function networks. Physical Review E, 98(3):033305, 2018.
[10] Marcel van Gerven and Sander Bohte. Artificial neural networks as models of neural information processing. Frontiers Media SA, 2018.
[11] Hafidz Zulkifli. Understanding learning rates and how it improves performance in
deep learning, 2018.
www.kth.se