Accelerating longitudinal spinfluctuation theory for iron at high temperature using a machine learning method

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Physics, Chemistry, and Biology

Master’s thesis, 30 ECTS | Applied Physics and Electrical Engineering - Theory, Modelling,

Computer calculations

2020 | LITH-IFM-A-EX–20/3819–SE

Accelerating longitudinal spin

fluctuation theory for iron at

high temperature using a

machine learning method

Marian Arale Brännvall

Supervisor : Björn Alling and Davide Gambino Examiner : Rickard Armiento

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut en-staka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfat-tning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circum-stances.

The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its proce-dures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Datum

Date

2020-08-24

Avdelning, institution

Division, Department

Theoretical Physics, Department of Physics,

Chemistry and Biology, Linköping University

URL för elektronisk version

ISBN

ISRN: LITH-IFM-A-EX--20/3819--SE

_________________________________________________________________ Serietitel och serienummer ISSN

Title of series, numbering ______________________________

Språk Language Svenska/Swedish Engelska/English ________________ Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport _____________ Titel Title

Accelerating longitudinal spin fluctuation theory for iron at high temperature using a machine learning method

Författare

Author

Marian Arale Brännvall

Nyckelord

Keywords

Sammanfattning

Abstra ct

In the development of materials, the understanding of their properties is crucial. For magnetic materials, magnetism is an ap parent property that needs to be accounted for. There are multiple factors explaining the phenomenon of magnetism, one being the effect of vibrations of the atoms on longitudinal spin fluctuations. This effect can be investigated by simulations, using density f unctional theory, and calculating energy landscapes. Through such simulations, the energy landscapes have been found to depend on the magnetic background and the positions of the atoms. However, when simulating a supercell of many atoms, to calculate energy landscapes for all atoms consumes many hours on the supercomputer.

In this thesis, the possibility of using machine learning models to accelerate the approximation of energy landscapes is investigated. The material under investigation is body-centered cubic iron in the paramagnetic state at 1043 K. Machine learning enables statistical predictions to be made on new data based on patterns found in a previous set of data. Kernel ridge regression is used as the machine learning method. An important issue when training a machine learning model is the representation of the data in t he so called descriptor (feature vector representation) or, more specific to this case, how the environment of an atom in a supercell is accounted for and represented properly. Four different descriptors are developed and compared to investigate which one yields the best result and why. Apart from comparing the descriptors, the results when using machine learning models are compared to when using other methods to approximate the energy landscapes. The machine learning models are also tested in a combined atomistic spin dynamics and ab initio molecular dynamics simulation (ASD-AIMD) where they were used to approximate energy landscapes and, from that, magnetic moment magnitudes at 1043 K. The results of these simulations are compared to the results from two o ther cases: one where the magnetic moment magnitudes are set to a constant value and one where they are set to their magnitudes at 0 K. From these investigations it is found that using machine learning methods to approximate the energy landscapes does, to a large degree, decrease the errors compared to the other approximation methods investigated. Some weaknesses of the respective descriptors were detected and if, in future work, these are accounted for, the errors have the potential of being lowered further.

(4)

(5)

Abstract

In the development of materials, the understanding of their properties is crucial. For magnetic materials, magnetism is an apparent property that needs to be accounted for. There are multiple factors explaining the phenomenon of magnetism, one being the effect of vibrations of the atoms on longitudinal spin fluctuations. This effect can be investi-gated by simulations, using density functional theory, and calculating energy landscapes. Through such simulations, the energy landscapes have been found to depend on the mag-netic background and the positions of the atoms. However, when simulating a supercell of many atoms, to calculate energy landscapes for all atoms consumes many hours on the supercomputer.

In this thesis, the possibility of using machine learning models to accelerate the ap-proximation of energy landscapes is investigated. The material under investigation is body-centered cubic iron in the paramagnetic state at 1043 K. Machine learning enables statistical predictions to be made on new data based on patterns found in a previous set of data. Kernel ridge regression is used as the machine learning method. An important issue when training a machine learning model is the representation of the data in the so called descriptor (feature vector representation) or, more specific to this case, how the environ-ment of an atom in a supercell is accounted for and represented properly. Four different descriptors are developed and compared to investigate which one yields the best result and why. Apart from comparing the descriptors, the results when using machine learning models are compared to when using other methods to approximate the energy landscapes. The machine learning models are also tested in a combined atomistic spin dynamics and ab initio molecular dynamics simulation (ASD-AIMD) where they were used to approximate energy landscapes and, from that, magnetic moment magnitudes at 1043 K. The results of these simulations are compared to the results from two other cases: one where the mag-netic moment magnitudes are set to a constant value and one where they are set to their magnitudes at 0 K.

From these investigations it is found that using machine learning methods to approx-imate the energy landscapes does, to a large degree, decrease the errors compared to the other approximation methods investigated. Some weaknesses of the respective descriptors were detected and if, in future work, these are accounted for, the errors have the potential of being lowered further.

(6)

(7)

Acknowledgments

I would like to thank my supervisor Björn Alling for suggesting this diploma work and for your guidance. I also want to thank co-supervisor Davide Gambino for all your help and feedback. Thank you Rickard Armiento for being the examinor and for your expertise and input.

(8)

(9)

List of Figures

2.1 Avoiding overfitting . . . 8

2.2 Cross-validation . . . 10

4.1 LSF energy landscapes . . . 14

4.2 Distribution of parameters . . . 14

4.3 Order of neighboring atoms in descriptor . . . 15

4.4 MAE depending on training set size . . . 18

4.5 Predicted and actual values of parameters . . . 20

4.6 Predicted and actual values of landscape minimum . . . 22

4.7 Predicted an actual value of magnetic moment size at 1043 K . . . 23

4.8 Distances between descriptors and parameters . . . 25

4.9 Voronoi volume approximation . . . 25

4.10 Voronoi volume approximation, predicted and actual values of parameters . . . . 26

4.11 Voronoi volume approximation, predicted and actual values of magnetic moment size . . . 26

4.12 Mean energy landscape approximation . . . 27

4.13 Energy landscape minimum approximation . . . 28

4.14 HTHP predicted and actual values . . . 29

4.15 ASD-AIMD simulations . . . 30

5.1 Comparison of MAEs of parameters . . . 32

5.2 Comparison of MAEs of energy landscape minimums . . . 33

5.3 Comparison of MAEs of magnetic moment size at 1043 K . . . 33

5.4 Cases where descriptors should be same, example 1 . . . 35

5.5 Cases where descriptors should be same, example 2 . . . 35

(12)

List of Tables

4.1 Descriptor content . . . 15

4.2 MAE for parameters . . . 18

4.3 MAE for landscape minimum . . . 22

4.4 MAE for magnetic moment size at 1043 K . . . 23

4.5 Voronoi volume approximation, MAEs . . . 27

4.6 Mean energy landscape approximation, MAEs . . . 27

4.7 HTHP MAEs . . . 29

4.8 ASD-AIMD simulations, MSD averages . . . 31

4.9 ASD-AIMD simulations, pressure averages . . . 31

5.1 Differences in descriptor performances . . . 34

(13)

Chapter 1 Introduction

A material possessing magnetic properties contains a massive number of magnetic moments that are interacting with each other and their environment. The source of these magnetic moments is the electrons. The intrinsic angular momentum, the spin, of the electrons gives rise to a magnetic moment. These magnetic moments cancel each other out in an atom where there are no unpaired electrons, leaving the atom without a magnetic moment (unless a mag-netic field is applied). With unpaired electrons (associated with unfilled shells), however, the electron spin magnetic moments might align and add up, giving a net magnetic moment lo-calized to the atom. Considering a solid, the interactions between the magnetic moments can lead to long-range magnetic order such as ferromagnetism and antiferromagnetism. In a fer-romagnetic material, the interactions among magnetic moments lead to the moments aligning in the same direction, resulting in the material being macroscopically magnetic. In the case of antiferromagnetism, and when there is no long-range order, the material will not show a mag-netic field on a macroscopic level. Magnetism is, therefore, a collective phenomenon where the cooperation between electrons and, in extension, between magnetic moments leads to major differences between magnetic properties of a macroscopic system and e.g. individual atoms. [1–3]

To understand and further develop materials, theoretical simulations are a useful tool. These simulations have the potential of making important predictions of the properties of a material. In the case of magnetic materials, magnetism has to be accurately simulated to get a complete and proper picture of the properties of the material. Magnetism being a many-body effect makes this a very challenging task.

A general way of viewing magnetic moments is by considering them to be vectors asso-ciated to atoms i.e. having magnitude and direction. The direction of the magnetic moment can fluctuate, called transversal spin fluctuations, and, likewise, the magnitude of the mo-ment can fluctuate, called longitudinal spin fluctuations (LSF). These two magnetic degrees of freedom affect each other but there is also an interplay between vibrations of the atoms and the magnetic fluctuations. One piece of the puzzle is the effect of vibrations on LSF. A supercell approach has been developed to investigate this aspect [4, 5]. The approach uses constrained density functional theory (DFT) to calculate the energy landscapes of the LSF and the magnitude of the magnetic moments. The LSF energy landscapes are affected by the mag-netic background and the atomic positions [5] which leads to the question of whether there are any patterns within the magnetic and positional configurations determining how the LSF energy landscapes turn out. Calculating each energy landscape using DFT is costly for a sys-tem with a large number of atoms and magnetic moments. In other words, it takes up a lot of time on the supercomputer. An example is when performing ab-initio molecular dynamic (AIMD) calculations where the movements of the atoms are simulated. Knowing the LSF energy landscape of each atom, with its magnetic moment, is important when understanding how the different atoms will interact with each other, i.e. to make an accurate simulation of their movements. The LSF energy landscapes depend on the local atomic and magnetic

(14)

envi-1.1. Aim

ronment, which in an AIMD simulation changes at every time step, this means that at every step the energy landscape has to be determined. If it were possible to predict LSF energy landscapes by using information from e.g. the magnetic background and atomic positions, instead of explicitly calculating each landscape, the process of approximating LSF energy landscapes could be substantially accelerated by using machine learning (ML) methods. ML methods are a collection of methods in which the ML model is trained based on given data and is then able to make predictions based on new data. ML problems can be grouped into different categories, one such category is supervised learning in which the training set con-tains both inputs and outputs and the goal is to predict the outputs based on the inputs [6]. Supervised learning is the type of learning used in this thesis.

1.1 Aim

This work aims to first and foremost investigate the possibility of using machine learning models to predict the LSF energy landscapes. The scheme of predicting LSF energy land-scapes is then investigated, analyzing, for example, how the input data should be structured in a vector called the descriptor.

Problem formulation

The investigation of an ML-approach to determine LSF energy landscapes is based on body-centered cubic (bcc) iron (Fe) in the paramagnetic state at the transition temperature (Curie temperature, Tc) 1043 K and at ambient pressure.

This work aims, more specifically, at answering the following questions:

1. Is it possible to use machine learning models to predict LSF energy landscapes? 2. How should the descriptor be constructed?

3. How small of an error can we get to and how does it compare to other methods of approximating LSF energy landscapes?

4. Can an ML approach be combined with atomistic spin dynamics (ASD) and AIMD simulations?

(15)

Chapter 2 Theoretical background

As previously mentioned, the electrons and their spin give rise to magnetic moments. To un-derstand the nature of electrons, the theory of quantum mechanics is needed and the method of density functional theory enables these problems to be solved in practice. This is discussed in the following chapter. The background to the theory of LSFs and the machine learning algorithm used in this work are also presented.

2.1 Quantum mechanics

The wave functionΨ contains all the information of a system of N nuclei and n electrons: Ψ=Ψ(rrr1, rrr2, ..., rrrn, σ1, σ2, ..., σn, RRR1, RRR2, ..., RRRN, t) (2.1) where rrriare the positions of the electrons, σithe electron spins and RRRIthe positions of the nu-clei. This is the time-dependent wave function.Ψ is the solution to the Schrödinger equation (in atomic units):

iBΨ

Bt =HΨˆ (2.2)

where ˆH is the Hamiltonian. The form of the Hamiltonian depends on the system considered. For a system of multiple nuclei and electrons interacting with each other the Hamiltonian can be written as: ˆ H=´1 2 n ÿ i=1 ∇2_i ´1 2 N ÿ I=1 1 MI∇ 2 I´ ÿ i,I ZI |rrri´RRRI| +1 2 ÿ i‰j 1 |rrri´rrrj| +1 2 ÿ I‰J ZIZJ |RRRI´RRRJ| (2.3) To simplify the problem, some approximations must be made. If there is no time-dependence in the Hamiltonian it is sufficient to only consider the time-independent Schrödinger equation for describing the system:

ˆ

HΨ=EΨ (2.4)

whereΨ is the set of eigenstates of the Hamiltonian and E the corresponding energy eigen-values.

The Born-Oppenheimer approximation assumes, based on the fact that the masses of the nuclei are much larger than the masses of the electrons, that the nuclei can be seen as station-ary from the electrons’ point of view. This makes it possible to deal with the kinetic energy of the nuclei separately, the interaction between nuclei is seen as a constant, EI I, and the inter-acting electrons are in an external field of the fixed nuclei. The Hamiltonian of the electrons can be written as:

ˆ Helec= 1 2 n ÿ i=1 ∇2_i ´ÿ i,I ZI |rrri´RRRI| +1 2 ÿ i‰j 1 |rrri´rrrj| =Tˆ+Vˆext+Vˆint (2.5)

(16)

2.2. Density functional theory

where ˆT is the kinetic energy operator for the electrons, ˆVextis the electron-nucleus interaction seen as a potential acting on the electrons and ˆVintis the electron-electron interaction.

With a periodic system, such as a crystal, the Bloch theorem makes it possible to only consider the unit cell, reducing the number of particles substantially. The wave function can then be written as:

Ψkkk(rrr) =eikkk¨rrrukkk(rrr) (2.6) where kkk is a vector in reciprocal space and ukkk(rrr)is a periodic function with the periodicity of the crystal.

2.2 Density functional theory

Properties of a collection of atoms, for example a crystal, can be calculated by solving the Schrödinger equation. The solution of the Schrödinger equation is the wave function which contains all the information of the system. However, the Schrödinger equation is a many-body problem, and solving it becomes impossible for a macroscopic system. Density func-tional theory is an approach to find solutions to the Schrödinger equation by using the elec-tron density instead of the wave function, avoiding the need to solve the many-body problem exactly.

The Hohenberg-Kohn theorems

Density functional theory rests on two fundamental theorems put forward by Hohenberg and Kohn [7]. The first one states the following [2]:

For any system of interacting particles in an external potential Vext(rrr), the potential Vext(rrr)is deter-mined uniquely, except for a constant, by the ground state particle density n0(rrr).

This means that with the ground state density, n0(rrr), all properties of the system are deter-mined. The second theorem states that [2]:

A universal functional for the energy E[n]in terms of the density n(rrr)can be defined, valid for any external potential Vext(rrr). For any particular Vext(rrr), the exact ground state energy of the system is the global minimum value of this functional, and the density n(rrr)that minimizes the functional is the exact ground state density n0(rrr).

If the actual form of the universal functional was known the exact electron density could be found by varying the density function, n(rrr). However, in practice, approximate forms of the functional are used. The total energy functional can be written as:

E[n] =T[n] +Eint[n] + ż

d3rVext(rrr)n(rrr) +EI I (2.7) where the first term on the right hand side is the kinetic energy of the electron, the second term is the electron-electron interaction, the third term is the interaction of the electron den-sity with the external potential and the last term is the interaction between nuclei. The kinetic energy and electron-electron interaction functionals are not known.

The Kohn-Sham ansatz

The Kohn-Sham ansatz [8] assumes that the system of interacting particles can be replaced by a system of non-interacting particles that has the same ground state electron density as the

(17)

2.2. Density functional theory

real system. The independent particles are, instead of to the external potential Vext(rrr), subject to an effective potential Ve f f(rrr): Ve f f(rrr) =Vext(rrr) + ż n(r_r_r111₎ |rrr ´ r_r_r111_|dr 1 r1 r1+Vxc(rrr) (2.8) where Vxc= BExc [n(rrr)] Bn(rrr) (2.9)

The effective potential contains three terms: the electron-nucleus interaction defined as a potential (Vext), the Hartree potential which describes the electrostatic interaction between electrons and surrounding electron density (including a self-interacting contribution) and in the final term all the many-body exchange and correlation contributions (including the correction of the self-interacting contribution) are collected together. The last potential is called the exchange-correlation potential, Vxc.

The Kohn-Sham equations can be written as: h ´1 2∇ 2 i +Ve f f(rrr) i ψi(rrr) =eiψi(rrr) (2.10) where ψi are the Kohn-Sham eigenstates and ei are the eigenvalues of the non-interacting single particles. The electron density is given by:

n(rrr) = N ÿ i=1

|ψi(rrr)|2. (2.11)

The Kohn-Sham total energy functional can be written as:

EKS[n] =´ 1 2 N ÿ i=1 xψi| ∇2|ψiy+ ż Vext(rrr)n(rrr)drrr+1 2 ż n(rrr)n(r_r_r111₎ |rrr ´ r_r_r111_| d 3_rd3_r1₊_{EI I}₊_Exc_[_n_] _(2.12)

where the terms on the right are: the kinetic energy of the non-interacting electrons, the Coulomb interactions between the electrons and the nuclei, the Coulomb interactions be-tween electrons, the Coulomb interactions bebe-tween nuclei and the exchange-correlation func-tional, respectively. The problem is now Exc[n], which exact form is not known and approxi-mations of it have to be made.

Exchange-correlation functional

The exchange-correlation needs to be determined to solve the Kohn-Sham equations. Since the true formulation of this functional is not known various approximate ones have been de-veloped. For magnetic materials, the spin should be included so that the exchange-correlation functional is explicitly (and exclusively) dependent on the spin.

The local spin density approximation (LSDA), assumes that the exchange-correlation energy density at each point in space is the same as in a homogeneous electron gas with that partic-ular density. The exchange-correlation energy is an integral over all space:

ELSDA_xc [nÒ₍_rrr₎_{, n}Ó₍_rrr_{)] =} ż

n(rrr)ehom_xc (nÒ(rrr), nÓ(rrr))d3r, (2.13)

where ehomxc is the exchange-correlation energy of the homogeneous electron gas and nÒand nÓ_{are the spin-up and spin-down electron densities, respectively .}

Another widely used approximation of the exchange-correlation functional is the gener-alized gradient approximation (GGA). GGA accounts for both the density, nÒ _{and n}Ó_{, and the} gradient of the density, ∇nÒ _and _∇_nÓ_{, when approximating Exc. One useful form of this} functional was proposed by Perdew, Burke and Ernzerhof (PBE) [9].

(18)

2.3. Magnetism in solids and longitudinal spin fluctuations

2.3 Magnetism in solids and longitudinal spin fluctuations

The magnetic fluctuations of a magnetic moment can be either transversal, where the direc-tion of the moment changes, or longitudinal. Longitudinal spin fluctuadirec-tions are fluctuadirec-tions of the magnetic moment magnitude. These magnetic degrees of freedom (DOF) are faster than the vibrational ones since they are intimately related to the electronic excitations. A rigid rotations of the magnetic moments (i.e. transversal spin fluctuations) are collective changes among the electrons and are, consequently, slower than LSFs which are more closely related to the electronic excitations [4, 5].

The understanding of magnetic materials have been developed through two models: the localized magnetic moment and itinerant electron model [10]. Within the localized moments model, the magnetic moments are seen as localized to the atoms and the sizes of the moments are constant. The Heisenberg Hamiltonian is used to describe the interaction of localized magnetic moments:

Hquant.=´ÿ i‰j

JijSˆi¨ ˆSj (2.14)

where ˆSi is the quantum spin-operator at lattice site i and Jij are interatomic exchange con-stants [11]. This is the quantum Heisenberg Hamiltonian (2.14) from which the classical Heisenberg Hamiltonian can be derived in the limit of ¯h Ñ 0, s Ñ 8 (where s is the spin quantum number) :

Hcl. =´ ÿ i‰j

Jijmmmi¨mmmj (2.15)

where mmmiis the magnetic moment vector at lattice site i [11].

The itinerant electron model, on the other hand, is based on the Stoner description of band theory [10] and is considered for the system as a whole. Here, the magnetization is due to the spontaneous spin-split of bands, and the magnetic moments are associated with the itinerant conduction electrons [1].

In an itinerant system, the sizes of the magnetic moments can fluctuate i.e. there will be longitudinal spin fluctuations, this is not accounted for by the Heisenberg Hamiltonian. For temperatures above the transition temperature (in the paramagnetic state), a way of account-ing for the LSF effects have been developed by extendaccount-ing the classical Heisenberg Hamilto-nian (2.15) [4, 5, 11–13]: HLSF=´ ÿ i‰j Jijmmmi¨mmmj+ ÿ i E(mi) (2.16)

where E(mi)is the LSF energy which describes how the energy of moment i depends on the local moment magnitude mi. E(mi)can be written as [4, 5]:

E(mi) = 8 ÿ n=0

anm2n_i «am2_i +bm4_i. (2.17) For an itinerant system in a paramagnetic background, the LSF energy landscape (2.17) will have a minimum at m = 0 and will depend heavily on the surrounding environment. The energy landscape of a localized system will, instead, have a minimum at a finite magnetic moment size and will, therefore, not depend as much on the surroundings [4].

For the LSF Heisenberg Hamiltonian (2.16), several different thermodynamic quantities can be calculated. One such quantity, which is used in this work, is the average magnetic moment magnitude at temperature T, xmi(T)y. Assuming coupling between transversal and

(19)

2.4. Machine learning

longitudinal DOFs and where the first term in the LSF Heisenberg Hamiltonian (2.16) is zero in the paramagnetic state, xmi(T)yis defined as [4, 5, 13]:

xmiy= 1 Zi ż8 0 dmi m3i e ´Ei(mi) kBT _(2.18)

where Ziis the partition function, Zi= ż8 0 dmim2_i e´Ei (_mi) kBT _. _(2.19)

2.4 Machine learning

Machine learning is a form of statistical learning where the machine is taught to make predic-tions based on patterns. Supervised learning is a type of learning where a previously gener-ated dataset, consisting of inputs and outputs, is used to train on, the training set, and based on it the machine finds patterns between output and inputs enabling it to make predictions of new data. The inputs are features relevant for determining the output and are assembled in what is called a feature vector representation or a descriptor. The machine learning method used in this work is the kernel ridge regression method (KRR). KRR with a Laplacian kernel has been used to, for example, predict formation energies of atomic crystal structures [14–16] in previous work.

Kernel Ridge Regression

The nonlinear regression method KRR stems from performing the so called kernel trick on the linear regression method ridge regression. In this section both the ridge regression method and the kernel trick, adding up to the KRR method, are discussed.

Ridge Regression

The main idea of ridge regression is to use the least square estimation method but adding a penalizing term to the minimization of the coefficients to avoid overfitting. For a more detailed description of the method the reader is referred to Ref. [6]. From an input vector, i.e. the descriptor, x = (x1, x2, ..., xn)T, a function f approximates the output, y. In ridge regression this function is given by:

f(xxx, βββ) =hhh(xxx)βββ=

ÿ j

h(xxx)_jβj (2.20)

where hhh(((xxx))) = (h(xxx)1, h(xxx)2, ..., h(xxx)q)is some function acting on the vector x, for example a polynomial, and βββ = (β1, β2, ..., βq)T are the coefficients that are chosen so that the residual

sum of squares (RSS) is minimized:

RSS= p ÿ i=1

(yi´f(xxxi, βββ))2= (yyy ´ hhh(XXX)βββ)¨(yyy ´ hhh(XXX)βββ) (2.21)

where XXX is the p ˆ n- matrix with all descriptors in the data set as rows and yyy is the vector of all outputs in the dataset. As previously mentioned, ridge regression adds a penalty to the coefficients so that (2.21) becomes:

RSS= (yyy ´ hhh(XXX)βββ)¨(yyy ´ hhh(XXX)βββ) +λβββ ¨ βββ (2.22)

where λ is a constant. In Figure 2.1 the impact of adding this penalty term is illustrated. By simply minimizing the error (RSS) without the penalty term there is a risk of overfitting the

(20)

function to the training data (green solid line), i.e. it is perfectly fitted to the training data but performs poorly when faced with new data. By adding the penalty term (dashed black line, where the penalty term is added) the function is instead not as well-fitted to the training data but becomes better at making predictions from new data.

Training data Test data

X Y

Least squares method Ridge regression method

Figure 2.1: A simple example of how adding the penalty term in ridge regression helps avoid overfitting.

The RSS is minimized by differentiating (2.22):

∇((yyy ´ hhh(XXX)βββ)¨(yyy ´ hhh(XXX)βββ) +λβββ ¨ βββ) =0. (2.23)

Solving βββfrom (2.23) gives: β

ββ= (hhh(XXX)Thhh(XXX) +λI)´1hhh(XXX)Tyyy (2.24)

where I is the identity matrix.

Kernel Ridge Regression and the Kernel Trick

A way to tackle a nonlinear problem is to map the data from the nonlinear space to a linear one with a mapping,Φ(xxx). This mapping is usually nontrivial but thanks to the so called kernel trick it need not be known explicitly. The ML algorithm is formulated in terms of the dot product of the data points and for the mapped problem this is simply the dot product of the mapping functions. The kernel trick is the substitution of the dot product with a function, k(xxx, x_x_x111₎_{, called a kernel [17]:}

k(xxx, x_x_x111₎_{” xxxx, x}_x_x111_y₌_x_Φ_Φ_Φ₍_xxx₎_,_Φ_Φ_Φ₍_x_x_x111₎_y. _(2.25) The function approximating the output can now be written as:

f(xxx, ααα) =

ÿ j

k(xxx, xxxj)αj, (2.26)

(21)

KKK is the kernel matrix where Kij=k(xxxi, xxxj). There are a number of different kernel functions but in this work the Laplacian kernel is used:

k(xxx, xx_x111_{) =}_e´}xxx´x_x_x111_}

1/σ _(2.28)

where }xxx ´ x_x_x111_}

1 =ři|xi´x1i|is the Manhattan norm and σ is the kernel width. The kernel function is a similarity measure [17] and the Laplacian kernel gives values between 0 and 1 where the closer the value is to 1 the more similar the two vectors, xxx and x_x_x111_{, are. This is} regu-lated by the kernel width, where a large kernel width will make more vectors be categorized as similar. The kernel width, σ, along with the regularization parameter, λ, are dependent on the problem at hand and are determined by testing different values in a reasonable range based on experience.

Representation of data

For the machine learning model to be able to make accurate and good predictions the repre-sentation of the input data in the descriptor is crucial. There are some requirements to what makes a good descriptor, which are stated in Ref. [14, 18]:

1. The descriptor should be complete and nondegenerate; it should contain all relevant fea-tures for the underlying problem.

2. The descriptor should be compact and unique; it should not contain features that are redundant to the underlying problem.

3. The descriptor should be descriptive; instances giving similar outputs should be repre-sented by descriptors that are close.

4. The descriptor should be simple; generating the descriptor should require little compu-tational effort.

In this work, relevant features could be those that describe the environment of an atom, such as the distances to neighboring atoms. The descriptor should be constructed so that two dif-ferent energy landscapes generate two difdif-ferent descriptors. A redundant feature would be for example the rigid rotation of the cell of atoms. If the environment of two atoms are very similar, perhaps only a slight difference in the directions of the neighboring moments, and their energy landscapes are similar, the descriptors should be close as well to fulfill require-ment 3.

Mean absolute error and cross-validation

In this work, the mean absolute error (MAE) is used to evaluate the performance of the ML model. To get a fair estimation of the error, the ML model has to be validated on data that it has not been trained on. In addition to the training set a test set is needed. The mean absolute error of the predictions made by the ML model is calculated by checking how the predicted properties of the new data matches the actual properties.

MAEtest= 1 n n ÿ i=1 |yi´f(xxxi, ααα)| (2.29)

where n is the size of the test set, yi are the known outputs and f(xxxi, ααα) are the predicted

outputs. However, with only one training set and one test set the evaluation of the ML model performance would be limited to these two sets of data. A more accurate MAE would be given by training and validating on a number of different sets of data. In the k-fold cross-validation method the total dataset is divided into k different parts and these parts alternate

(22)

in acting as the test set while the rest are used as the training set. In this way a more reliable MAE is gained. In Figure 2.2 this is illustrated by an example where k=4.

Data

Split into k=4 sets

Calculate MAEtest

Fold 1 Fold 2

...

Figure 2.2: Illustration of the cross-validation scheme when k=4. The data is randomly split into 4 parts. One of these parts acts as a test set (colored circles) which is kept out when the model is trained on the remaining three parts (gray circles). This is iterated until each part has been used as a test set.

Cross-validation can be used to set the hyperparameters, λ and σ, in the maner presented in Ref. [19].

(23)

Chapter 3 Computational details

In the following chapter the computational details of the DFT calculations and of the imple-mentation of the machine learning algroithm are presented.

3.1 Electronic structure calculations to obtain the dataset

One set of data, made up of 752 data points, is from the system of bcc Fe at Tcand with a lattice parameter of 2.88 Å, corresponding to the equilibrium lattice parameter at 0 K expanded with experimental thermal expansion. This system is, from now on, referred to as the Tc-system. Another set of data comes from a system of bcc Fe at 6000 K and 300 GPa, consisting of 54 data points. This system is referred to as the HTHP-system (high-temperature-high-pressure-system).

The Vienna ab initio simulation package (VASP) [20] is used to obtain both sets of data and, in both cases, the calculations are performed on a supercell of 54 atoms (3x3x3 bcc unit cell). VASP is used with projector-augmented wave (PAW) potentials [21]. The exchange-correlation functional employed is the PBE-GGA functional [9]. For the Tc-system a number of different snapshots, produced by combined atomistic spin dynamics - ab initio molecu-lar dynamics (ASD-AIMD) [5, 22] simulations, are used to get different configurations, both atomic and magnetic. In the case of the HTHP-system, only one snapshot is used and the atomic configuration of that snapshot comes from a non-magnetic molecular dynamics simu-lation made by Ref. [23]. The magnetic moments are set to have different directions [4]. Both the directions and sizes of the magnetic moments are constrained with the method offered in VASP with a constraining parameter set to λ = 10 for the Tc-system and λ = 25 (in two steps: first λ = 10 and then λ = 25) for the HTHP-calculations. For the Tc-system, going from λ = 10 to λ = 25 results in an energy change of only about 1.5 meV, motivating why

λ=10 is enough for the Tc-system. In the Tc- and HTHP-calculations the energy cutoff are

set to 500 eV and 400 eV, respectively. The k-point mesh is set to 3x3x3 for the Tc-system and to 5x5x5 for the HTHP-system.

The energy landscape of a specific atomic moment is calculated by fixed spin moment calculations. The magnetic moments are completely constrained, as described previously, and then the magnitude of the considered moment is varied in steps in the fixed background [4].

The Voronoi volume has been seen to correlate with the magnetic moment sizes in bcc Fe [4]. The Voronoi volume of a particle is the volume of a cell corresponding to all the space which is closer to that particle than any other and here it is calculated using the open source software library Voro++ [24].

(24)

3.2. Atomistic spin dynamics - ab initio molecular dynamics

3.2 Atomistic spin dynamics - ab initio molecular dynamics

ASD-AIMD is a method that accounts for the coupling between spin fluctuations and lat-tice vibrations. The ASD and AIMD steps alternate throughout the simulation. From an initial positional configuration of the atoms (resulting, e.g., from disordered local moment (DLM)-AIMD simulations) the distance-dependent exchange interactions can be assigned to each pair of atoms. The distance-dependent exchange interactions, along with the atomic positions, are then used to determine an initial magnetic configuration. This is used in the following AIMD step where the atomic positions are updated by calculating the forces acting on the atoms. From the new atomic positions, new exchange constants are obtained. With them, an ASD step can be performed to update the magnetic configuration which is used for the next AIMD step, and so on. [5, 22]

The ASD-AIMD simulations are carried out, for Fe at Tc and with a lattice parameter of 2.88 Å, on a 54 atoms supercell (3x3x3 bcc unit cell). The k-point mesh isΓ-centered and set to 2x2x2. The size and direction of the magnetic moments are constrained by the method offered in VASP, with λ=10. The energy cutoff was set to 500 eV and the PBE-GGA functional was used. The constraint method in VASP is not ideal, so to get the wished magnetic mo-ment sizes, the magnetic momo-ment sizes fed in to the simulation have to be scaled. In these simulations the magnetic moments are scaled by 0.7.

3.3 Machine learning

The code for implementing ML is written in the programming language Python [25]. The KRR method is employed through the Python toolkit for quantum machine learning (QML) [26]. Among other modules, QML offers a kernel module from which the Laplacian kernel is calculated. For the cross-validation scheme functions from the Python machine learning li-brary Scikit-learn [27] was used. The Python lili-brary Numpy [28] was used for linear-algebra-operations and for constructing arrays.

(25)

Chapter 4 Results

In the following chapter the results of different descriptor constructions are presented, along with the results of other methods approximating the LSF energy landscapes and when adding HTHP-data to the training and testing datasets. The results of the DFT-calculations producing the dataset and the results of the ASD-AIMD simulations are also presented.

4.1 Dataset

As mentioned earlier, the idea of supervised learning is to build a model based on a set of data containing both outputs and inputs. In the subsections below the outputs and inputs considered in this work are described. The data mainly come from calculations made on the Tc-system, following the computational details described in section 3.1. From ASD-AIMD simulations, several snapshots were obtained, giving different positional and magnetic con-figurations. For each magnetic moment in these configurations an LSF energy landscape can be calculated, i.e. each magnetic moment corresponds to one data point in the dataset.

Outputs - LSF energy landscapes

The outputs are the a and b parameters of the fourth order polynomial (see section 2.3):

am2_i +bm4_i (4.1)

where mi is the magnitude of the i-th magnetic moment. In other words: the outputs are the different LSF energy landscapes. The a and b parameters have the units _momenteV /µ2_Band

eV

moment/µ4B, respectively, and they are predicted independently from each other. In Figure 4.1 the energy landscapes of the 54 atoms of one snapshot are shown.

(26)

4.1. Dataset

Figure 4.1: LSF energy landscapes of 54 atoms.

Machine learning models for the a and b parameters are obtained both when only includ-ing data from the system of bcc Fe at Tcand when including data from high temperature and high pressure (HTHP) calculations (6000 K and 300 GPa). In total there are 752 data points from the Tc-calculations and 54 data points from the HTHP-calculations. In Figure 4.2 the spread of the values of the a and b parameters, are shown as histograms.

(a) Parameter a. (b) Parameter b.

Figure 4.2: Distribution of the values of each parameter, both with and without HTHP data

Inputs - Descriptor

The input data are represented by a descriptor. The different types of descriptors developed during this diploma work are summarized in Table 4.1. V stands for the Voronoi volume , ei is the direction of moment i and m is the magnitude of the magnetic moment. Index 0 is for the considered atom and index 1 to N are for the nearest neighbors where N depends on how

(27)

4.1. Dataset

scapes depend on the magnetic background the descriptor needs to be constructed in a way that takes the environments of the atomic moments into account. As previously mentioned, the magnetic moment sizes correlate with the Voronoi volume in bcc Fe and thus making the Voronoi volume an apparent element of the descriptor. The magnetic field acting on the considered atomic moment can be taken into consideration by letting the positions and di-rections of the neighboring atomic moments be part of the descriptor. The didi-rections of the magnetic moments are considered in different ways, either by their explicit directions or by the scalar products of their directions, where the scalar products give information about the angles between magnetic moments.

Table 4.1: Different constructions of the descriptor. Descriptor nr. Descriptor content

1 V, ex₀, ey₀, ez₀, r1, θ1, φ1, ex1, ey1, ez1, ..., rN, θN, φN, exN, eyN, ezN 2 V, r1, θ1, φ1, e0¨e1, ..., rN, θN, φN,e0ëN 3 V, r1,θ1, φ1, ..., rN, θN, φN, řN n=1e0ën 4 V, m0, r1, θ1, φ1, e0¨e1, ..., rN, θN, φN,e0ëN

The order in which the neighboring atoms are sorted in the descriptor is based on their ideal lattice positions. In Figure 4.3 this is shown in 2D for simplicity. The considered atom is placed in the origin. From the ideal lattice positions (the left side of Figure 4.3) the neighbor-ing atoms are sorted by the θ and φ angle and by their distance, r, to the considered atom. The non-ideal positions (the right side of Figure 4.3) are then given in the descriptor.

0 1 2 3 4 0 1 2 3 4

Figure 4.3: 2D description of how the order, in the descriptor, of the neighboring atoms is set. Descriptor 1: Contains the Voronoi volume of the considered atom and the x-, y- and z-component of its normalized magnetic moment vector. It is followed by the spherical coor-dinates and the x-, y- and z-component of the normalized magnetic moment vector of each neighboring atom.

Descriptor 2:Contains the Voronoi volume of the considered atom. This descriptor does not contain, unlike the previous one, the components of the normalized magnetic moments of any atoms. Instead it contains the scalar product between the normalized magnetic moment vector of the considered atom and of each of the neighboring atoms, in addition to their spherical coordinates.

Descriptor 3: Contains the Voronoi volume of the considered atom and the spherical coor-dinates of each neighboring atom. It also contains the sum of all scalar products described previously.

Descriptor 4:This descriptor is almost identical to descriptor 2 with the exception that it also contains the magnitude of the magnetic moment at "0 K" of the considered atom. The "0 K"

(28)

4.2. Performance of ML models

magnitude of the moment is given by a simulation in VASP where the magnetic moments have been allowed to relax in size.

4.2 Performance of ML models

The 752 data points from the Tc-system are used to generate the following results. The case when high temperature and high pressure data are included is presented in section 4.4. The performance of the machine learning models depend on, for example, how the descriptor is constructed, on the hyperparameters λ and σ (described in section 2.4) and on the amount of training data. In this section the performance of each descriptor construction is described. The regularization parameter, λ, is set to 10´4for all descriptors, changing it did not have a big impact on the error. The kernel width, σ, is the one giving the minimum MAE when trying different values, ranging from 10´3to 103. The MAE versus kernel width graphs for all descriptors are shown in appendix A.

MAE and size of the training set

Figure 4.4 shows the MAE versus the size of the training dataset for each of the descriptors. and the kernel widths are set according to appendix A.

(29)

(c) Descriptor 2, parameter a. (d) Descriptor 2, parameter b.

(30)

(g) Descriptor 4, parameter a. (h) Descriptor 4, parameter b.

Figure 4.4: The mean absolute error as a function of the size of the training set. One graph for each number of shells included.

In Table 4.2 the minimum MAEs gained, and the number of shells they correspond to, for each descriptor, are listed.

Table 4.2: MAE for parameter a and b for each descriptor.

Descriptor 1 Descriptor 2 Descriptor 3 Descriptor 4 MAEa 0.0167 (1 shell) 0.0115 (3 shells) 0.0116 (1 shell) 0.00767 (1 shell) MAEb 0.00138 (1 shell) 0.000989 (1 shell) 0.000973 (1 shell) 0.000798 (1 shell)

Predicted and actual values

Scatter plots, where predicted values are set against actual values, are made to visualize how well each descriptor predicts the parameters, these are presented in Figure 4.5.

(31)

(32)

Figure 4.5: Scatter plots showing the comparison between predicted and actual values of parameters a and b.

From the a- and b-parameters the minimums of the LSF energy landscape can be calculated. In Figure 4.6 the scatter plots, comparing the predicted and actual values of the energy min-imum and the magnetic moment size at the minmin-imum (i.e. the "0 K" magnetic moment size), are shown.

(a) Descriptor 1, minimum energy of landscape.

(b) Descriptor 1, minimum magnetic moment size of landscape.

(33)

(c) Descriptor 2, energy at the minimum of landscape.

(d) Descriptor 2, magnetic moment size at the mini-mum of landscape.

(e) Descriptor 3, energy at the minimum of landscape.

(f) Descriptor 3, magnetic moment size at minimum of landscape.

(34)

(g) Descriptor 4, energy at minimum of landscape.

(h) Descriptor 4, magnetic moment size at minimum of landscape.

Figure 4.6: Scatter plots showing the comparison between predicted and actual values of LSF energy landscape minimum, both energy and magnetic moment magnitude.

From the predicted and actual values of the minimum of the energy landscapes the MAEs can be calculated, these are given in Table 4.3.

Table 4.3: MAE for energy and magnetic moment magnitude at energy landscape minimum, for each descriptor.

Descriptor 1 Descriptor 2 Descriptor 3 Descriptor 4

MAEenergy(eV) 0.0442 0.0347 0.0314 0.0217

MAEmoment0K(µB) 0.128 0.121 0.0907 0.0758

From the a- and b-parameters the magnetic moment sizes at 1043 K can be calculated numer-ically, as described in section 2.3. In Figure 4.7 the scatter plots of the predicted versus actual values of these magnetic moment sizes are shown. In Table 4.4 the errors of these predictions are given.

(35)

(c) Descriptor 3 (d) Descriptor 4

Figure 4.7: Predicted and actual magnetic moment magnitudes at 1043 K.

Table 4.4: MAE for magnetic moment magnitude at 1043 K.

Descriptor 1 Descriptor 2 Descriptor 3 Descriptor 4

MAEmoment1043K(eV) 0.0581 0.0735 0.0469 0.0403

Distances between descriptors

The scatter plots in Figure 4.8 are made to visualize how well the different constructions of descriptors detect similarities between data points. Two magnetic moments with very simi-lar LSF energy landscapes (i.e. simisimi-lar a- and b-parameters) should be described by simisimi-lar descriptors.

(36)

(37)

4.3. Alternative approximation methods

Figure 4.8: Comparison of distance between descriptors and corresponding distance between parameters.

4.3 Alternative approximation methods

To understand how well the approach of using ML for predicting LSF energy landscapes works it is necessary to have alternative approximation methods to compare with. In the following section some methods of approximating properties of LSF energy landscapes are presented. In all approximation methods described, the 752 data points from the Tc-system are used.

Voronoi volume

One way of approximating the LSF energy landscapes is by simple linear regression where Voronoi volume is the only variable determining the parameters, this is shown in Figure 4.9

(a) Simple linear regression model, parameter a. (b) Simple regression model, parameter b.

Figure 4.9: Simple linear regression models for parameter a and b, based on the Voronoi volume.

(38)

In Figure 4.10 the predicted and actual values, when using this Voronoi volume approxima-tion, of the parameters are shown as scatter plots.

(a) Parameter a. (b) Parameter b.

Figure 4.10: Predictions based on the Voronoi volume approximation versus actual values of each parameter.

In Figure 4.11 the scatter plot of predicted magnetic moment sizes at 1043 K compared to actual sizes (when using the Voronoi volume approximation) is shown. In Table 4.5 the errors are presented for the different properties.

Figure 4.11: Predictions based on the Voronoi volume approximation versus actual values of magnetic moment magnitudes as 1043 K.

(39)

Table 4.5: MAEs, when using the Voronoi volume approximation, of the parameters, the energy and magnetic moment size at the minimum of the energy landscapes and the magnetic moment size at 1043 K. MAEa 0.0228 MAEb 0.0018 MAEenergy(eV) 0.0587 MAEmoment0K(µB) 0.180 MAEmoment1043K(µB) 0.176

Mean LSF energy landscape

One approximation method is to simply let the mean LSF energy landscape be an approx-imation of all energy landscapes, as illustrated in Figure 4.12. In the figure all 752 energy landscapes are plotted as thin blue lines.

Figure 4.12: Approximating all LSF energy landscapes (thin blue lines) to be the mean energy landscape (thick red line).

The errors of the predictions are given in Table 4.6.

Table 4.6: MAEs, when using the mean landscape approximation, of the parameters, the energy and magnetic moment size at the minimum of the energy landscapes and the magnetic moment size at 1043 K. MAEa 0.0285 MAEb 0.0019 MAEenergy(eV) 0.0816 MAEmoment0K(µB) 0.257 MAEmoment1043K(µB) 0.131

Approximating the LSF energy landscape minimum

Here, the magnetic moment size at the LSF energy landscape minimum is approximated by the "0 K" magnitude of the moment. These "0 K" magnitudes are obtained by constraining the

(40)

4.4. High-temperature-high-pressure data

magnetic moments directions but not their sizes, using VASP. The scatter plot of the predicted versus actual magnetic moment sizes, at the energy landscape minimum, is shown in Figure 4.13.

Figure 4.13: Scatter plot of the relaxed magnetic moment sizes as approximations for the magnetic moment corresponding to the energy landscape minimum.

The MAE is, in this case, 0.1069 µB.

4.4 High-temperature-high-pressure data

Data from the system of bcc Fe at 6000 K and 300 GPa is added both for the training set and test set. Here, descriptor 3 has been used. In Figure 4.14 the predicted against actual values are shown for the parameters and the energy landscape minimum.

(41)

4.5. ASD-AIMD simulations

(c) Energy at LSF energy landscape minimum

(d) Magnetic moment size at LSF energy landscape minimum

Figure 4.14: Predicted and actual values of parameters and LSF energy landscape minimum.

In Table 4.7 the errors when including HTHP data are given.

Table 4.7: MAEs, when including HTHP data in training and validation set, of the parameters and of the energy and magnetic moment size at the minimum of the energy landscapes.

MAEa 0.0127

MAEb 0.00105

MAEenergy(eV) 0.0291 MAEmoment0K(µB) 0.0868

4.5 ASD-AIMD simulations

An ASD-AIMD simulation using the ML models (trained on the Tc-data and using descriptor 3) have been conducted at 1043 K, named the "ML"-simulation. This approach is compared with a simulation where no LSFs are included which is referred to as the "0 K"- simulation, and a simulation where the sizes of the magnetic moments are held constant at 2.11 µB, which is referred to as the "constant sizes"- simulation.

In Figure 4.15 the variation of the magnetic moment size, of a randomly picked moment, over time is shown along with the cumulative average size. In the figure the histograms of the occurrence of different magnetic moment sizes are also shown for each simulation.

(42)

(a) "0 K"-simulation (b) "0 K"-simulation

(c) "ML"-simulation (d) "ML"-simulation

(e) "Constant sizes"-simulation (f) "Constant sizes"-simulation

Figure 4.15: The plots to the left are of magnetic moment size over time for a selected magnetic moment (thin dashed line) and the average moment size at each time step (thick solid line). The plots to the right are histograms describing the frequency of different magnetic moment sizes.

In Tables 4.8 and 4.9 the mean square displacement and the pressure are shown, respectively, for the three different simulations. The final average and the associated standard deviation

(43)

Table 4.8: Final averages and standard error of the mean square displacement in each simu-lation.

Final average ˘ standard error 0 K 0.0811 ˘ 2.00 * 10´4Å2

ML 0.0802 ˘ 2.81 * 10´4Å2

Constant sizes 0.0724 ˘ 2.10 * 10´4Å2

Table 4.9: Final averages and standard error of the pressure in each simulation. Final average ˘ standard error

0 K -32.4 ˘ 0.089 kBar

ML -28.9 ˘ 0.11 kBar

(44)

Chapter 5 Discussion

In the following chapter the results previously presented are discussed. The performance of the different descriptors are compared and their respective weaknesses and strengths are investigated.

5.1 Performance

From Figure 4.4 we see that increasing training set size, in general, decreases the MAE of the parameters, but less so for descriptor 3 where there is not much improvement after a training set of 200 data points. In general, it seems like a bigger set of training data should help lower the MAE of the parameters, which suggests that the training works as it should.

The errors presented in sections 4.2 and 4.3 are compared with each other in Figures 5.1, 5.2 and 5.3. The alternative approximation methods (described in section 4.3) are named the Voronoi volume approximation, mean landscape approximation and minimum approxima-tion. Using ML is an improvement compared to the mean energy landscape approximation and Voronoi volume approximation. The minimum approximation is better at predicting the magnetic moment size at the energy landscape minimum than ML is when using descriptors 1 and 2, but the error becomes slightly lower when using descriptors 3 and 4.

(a) Comparing MAE for parameter a . (b) Comparing MAE for parameter b.

Figure 5.1: The errors, for the parameters a and b, when using different descriptors and other approximation methods.

(45)

5.1. Performance

(a) Comparing MAE for energy minimum of energy landscape.

(b) Comparing MAE for magnetic moment size at en-ergy landscape minimum.

Figure 5.2: The errors, for the energy landscape minimum, when using different descriptors and other approximation methods.

Figure 5.3: The errors, for the magnetic moment size at 1043 K, when using different descrip-tors.

Descriptor 4 gives the lowest errors. However, an important aspect of descriptor 4 is the inclusion of the size of the relaxed magnetic moment of the considered atom, denoted m0, i.e. the "0 K" magnitude. It makes sense that including m0would improve the MAE, essentially the descriptor is given a hint on where the minimum of the energy landscape should be. Including m0would require one extra simulation on the supercomputer where the magnetic moments are allowed to relax in size. With this in mind, descriptor 4 is not necessarily the prime descriptor out of the four presented despite its low errors. The differences in MAEs are reflected in the scatter plots in Figure 4.5. Comparing the scatter plots, it is clear that descriptor 1 does not perform as well as the other descriptors. Small parameters are predicted to have larger values and vice versa, more so for parameter b but the tendency can be seen for

(46)

5.2. Weaknesses in the descriptors

both parameters. Descriptors 2 and 3 give very similar MAEs for the parameters and for both descriptors we see, from the scatter plots in Figure 4.5, that there are difficulties in predicting small b-parameters, this is seen also, to a lesser extent, for descriptor 4. Even though the errors of parameters a and b are very similar for descriptors 2 and 3 the difference is a bit larger when looking at the energy landscape minimum, here descriptor 3 performs slightly better. There is a trend that for each descriptor, going from number 1 to number 4, the error is decreasing (or not changing much), except for the magnetic moment size at 1043 K, seen in Figure 5.3. In this case, descriptor 2 performs worse than descriptor 1, the reason for this is not clear.

In Table 5.1 the decrease or increase, in percentage, of the MAE when comparing one descriptor to the other is shown.

Table 5.1: Percentage change, of errors, from one descriptor to another.

1Ñ2 2Ñ3 3Ñ4 MAEa -31 % +0.85 % -34 % MAEb -28 % -1.6 % -18 % MAEenergy -21 % -9.5 % -31 % MAEmoment0K -5.5% -25 % -16 % MAEmoment1043K +27% -36 % -14 %

By employing descriptor 2 instead of descriptor 1 MAEais lowered by „ 31 % and MAEbby „ 28 %. This improvement can be compared with the improvement of applying machine learn-ing for predictlearn-ing the energy landscapes with uslearn-ing other approximation methods. MAEa is lowered by „ 27 % when applying machine learning and descriptor 1 compared to the Voronoi volume approximation, MAEb is lowered by „ 23 %. For the mean landscape ap-proximation, MAEais lowered by „ 41% and MAEb by „ 27% with machine learning and descriptor 1. Even with the worst performing descriptor, the MAEs of both parameters are lowered when using machine learning and by employing a better descriptor it is plausible to get to even lower errors.

5.2 Weaknesses in the descriptors

Why does descriptor 1 result in higher MAEs (in general) than the rest of the descriptors? Two of the requirements, number two and three, (in section 2.4) stated that the descriptor should contain as few redundant features as possible and be constructed in a way such that instances giving similar outputs should be described by descriptors that are close. Rotation of the cell of atoms being a redundant feature that should not alter the descriptor. In Figure 5.4 a simple example, of when two descriptors should be the same, is illustrated. With de-scriptor 1 the right and left hand side of Figure 5.4 would not generate the same dede-scriptor since the directions of all magnetic moments are considered. However, with the rest of the descriptors, these two cases would generate the same descriptor because the scalar products of the directions of the magnetic moments are considered instead of the explicit directions.

(47)

5.2. Weaknesses in the descriptors

Figure 5.4: Simple illustration of two cases where the descriptors should be the same. The red circle is the considered atom and the blue ones are its neighbors, the black arrows symbolize the magnetic moments.

This could be an explanation to why descriptor 1 preforms worse than the others. Nonethe-less, this is a very simplified example, the positions of the atoms are ideal in Figure 5.4 and the directions of the moments are simply up or down. With random directions of the mo-ments and vibrating atoms, as in Figure 5.5, where the left configuration is the same as the right except rotated 180˝_{, all descriptors would fail in categorizing the two cases as the same.}

0 0 1 2 0 1 2 3 4 3 4

Figure 5.5: Another example where the descriptors of the two cases, left and right in the figure, should be the same. Red circle represents the considered atom and the blue ones are its neighboring atoms.

The reason for the descriptors not being able to see these two cases as the same is the way that the neighboring atoms are sorted in the descriptor. The sorting is only based on the ideal positions of the atoms, as explained in section 4.1, leading to the elements of the descriptor being the same for both cases but ordered differently. This is an important problem of de-scriptors 2, 3 and 4 and if solved the errors could potentially be lowered significantly. Note, however, that descriptor 3 contains the sum of the scalar products instead of each scalar prod-uct which should lead to the two descriptors, in the cases of Figure 5.5, being slightly closer than for descriptor 2 and 4. This is possibly the reason why descriptor 3 slightly outperforms descriptor 2 in some cases.

The scatter plots in Figure 4.8 are made to illustrate the similarities of descriptors and how it correlates with similarities in the parameters. As mentioned before, similar a-values should lead to similar descriptors. For descriptors 3 and 4 there is a slightly larger tilt, suggesting that descriptors that are further apart, in fact, correspond to parameters that are further apart. As explained previously, descriptor 3 containing the sum of scalar products makes it possible to spot similarities slightly better than when including each scalar product and the effect of this is probably what is shown in Figure 4.8 (e) and (f).

(48)

5.3. Including HTHP data

5.3 Including HTHP data

From Figure 4.14 and Table 4.7 it can be seen that including HTHP data in general worsens the errors, which is to be expected since the majority of the data still comes from the Tc-system. One exception is the magnetic moment size at the energy minimum i.e. the "0 K"-size. Interestingly, the ML model still manages to distinguish between the HTHP data and the rest. This is clear from Figure 4.14(a) where the a-parameters of the HTHP data are clustered around 0.25, clearly separated from the other a-parameters. The b-parameters do not differ as much (which was seen in Figure 4.2(b) as well) for the HTHP data.

5.4 Bias

There is one obvious bias in the set of training data, and, in extension, in the machine learning models, and that is the fact that all or, when HTHP data is included, more than 90 % of the data come from bcc Fe at Tcand ambient pressure. To get more generalized ML models, this could, of course, be accounted for by including data from a wider range of systems. Figure 4.2, describing the spread of a and b values in the dataset, with an without HTHP data, gives a hint of how the ML models could be less biased given data from different systems. For the a-parameters inclusion of HTHP data gives more a-values larger than 0, for the b-parameters including HTHP data does however not change the range of values much. In summary, if the aim would be to get a more general ML model, the histograms shown in in the mentioned figures should be flatter and, if possible, over a broader spectrum.

5.5 ASD-AIMD simulation

From Figure 4.15 we see that without inclusion of LSFs, i.e. the "0 K"- simulation, the sizes of the magnetic moments fluctuate much more than in the other cases and, at some points, goes down to very low values (this also makes the average magnetic moment size fluctuate more in the "0 K"-simulation). With the ML models predicting the magnetic moment sizes, the small magnitudes, which can cause some computational problems, are avoided. The fluc-tuations are, as to be expected, very small when the magnetic moment sizes are constrained to a constant value.

The final average of the MSD is smallest in the "constant sizes"-simulation suggesting that, when the magnetic moments are kept at a constant size, the atoms do not move as much as when the magnetic moment sizes changes. The pressure is highest when ML models are used to determine the sizes of the moments.

5.6 Additional note

The entire data set contains 752 data points (and 54 data points of HTHP data) but it would of course be possible to extend this dataset further. As was seen, larger training datasets did, in general, decrease the MAE and so it would, most likely, be beneficial to expand it. A question arises if whether or not it is possible to generate an exhaustive amount data, i.e. a case when there would be almost no new data from which the ML models could make predictions. This could be the case especially if we kept generating data from the same system. Adding more systems would be favorable, not only to generalize the ML models but also because it would enable us to generate a lot more data, yet with a decreased risk of it being exhaustive.

Accelerating longitudinal spinfluctuation theory for iron at high temperature using a machine learning method

Linköping University | Department of Physics, Chemistry, and Biology

Master’s thesis, 30 ECTS | Applied Physics and Electrical Engineering - Theory, Modelling,

Computer calculations

2020 | LITH-IFM-A-EX–20/3819–SE

Accelerating longitudinal spin

fluctuation theory for iron at

high temperature using a

machine learning method

Marian Arale Brännvall

Upphovsrätt

Copyright

Datum

2020-08-24

Theoretical Physics, Department of Physics,

Chemistry and Biology, Linköping University

URL för elektronisk version

ISBN

ISRN: LITH-IFM-A-EX--20/3819--SE

Acknowledgments

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Aim

Problem formulation

Chapter 2

Theoretical background

2.1

Quantum mechanics

2.2

Density functional theory

The Hohenberg-Kohn theorems

The Kohn-Sham ansatz

Exchange-correlation functional

2.3

Magnetism in solids and longitudinal spin fluctuations

2.4

Machine learning

Kernel Ridge Regression

Ridge Regression

Representation of data

Mean absolute error and cross-validation

Chapter 3

Computational details

3.1

Electronic structure calculations to obtain the dataset

3.2

Atomistic spin dynamics - ab initio molecular dynamics

3.3

Machine learning

Chapter 4

Results

4.1

Dataset

Outputs - LSF energy landscapes

Inputs - Descriptor

4.2

Performance of ML models

MAE and size of the training set

Predicted and actual values

Distances between descriptors

4.3

Alternative approximation methods

Voronoi volume

Mean LSF energy landscape

Approximating the LSF energy landscape minimum

4.4

High-temperature-high-pressure data

4.5

ASD-AIMD simulations

Chapter 5

Discussion

5.1

Performance

5.2

Weaknesses in the descriptors

5.3