Active learning of interatomic potentials to investigate thermodynamic and elastic properties of Ti0.5Al0.5N at elevated temperature

(1)

Diploma project | LITH-IFM-A-EX—21/3922–SE | The Department of Physics, Chemistry and Biology (IFM)

Active learning of interatomic potentials to investigate

thermodynamic and elastic properties of Ti

0.5

Al

0.5

N at

elevated temperature

Florian Bock

2021-05-20

Supervisor: Senior Lecturer Ferenc Tasn´adi, PhD Examiner: Professor Igor A. Abrikosov, PhD

Link¨opings universitet SE-581 83 Link¨oping 013-28 10 00, www.liu.se

(2)

Acknowledgements

I would like to thank my supervisor Ferenc Tasn´adi for tirelessly teaching, supporting and correcting me throughout this project. I’m truly grateful that his virtual door always was open when I needed to discuss anything. I’d also like to thank the group of Theoretical Physics (TEOFY) at the Department of Physics, Chemistry and Biology (IFM) for treating me like a colleague since the day I started working with them.

None, however, has provided me with greater support than my lovely fiancee Kristina. Thank you for making the long home office hours bearable, and supporting me around the clock. This work would not have been possible without you!

(3)

Abstract

With the immense increase in the computational power available for the material science community in recent years, a range of new discoveries were made possible. Accurate investigations of large scale atomic systems, however, still come with an extremely high computational demand. While the recent development of Graphics Processing Unit (GPU) accelerated supercomputing might offer a solution to some extent, most well known electronic structure codes have yet to be fully ported to utilize this new power. With a soaring demand for new and better materials from both science and in-dustry, a more efficient approach for the investigation of material properties needs to be implemented. The use of Machine Learning (ML) to obtain Interatomic Potentials (IP) which far outperform the classical potentials has increased greatly in recent years. With successful implementation of ML methods utilizing neural networks or Gaussian basis functions, the accuracy of ab-initio methods can be achieved at the demand of simulations with empirical potentials. Most ML approaches, however, require high accuracy data sets to be trained sufficiently. If no such data is available for the system of interest, the immense cost of creating a viable data set from scratch can quickly negate the benefit of using ML.

In this diploma project, the elastic and thermodynamic properties of the Ti0.5Al0.5N random alloy

at elevated temperature are therefore investigated using an Active Learning (AL) approach with the Machine Learning Interatomic Potentials (MLIP) package. The obtained material properties are found to be in good agreement with results from computationally demanding ab-initio studies of Ti0.5Al0.5N, at a mere fraction of the demand. The AL approach requires no high accuracy data

sets or previous knowledge about the system, as the model is initially trained on low accuracy data which is removed from the training set (TS) at a later stage. This allows for an iterative process of improving and expanding the data set used to train the IP, without the need for large amounts of data.

(4)

Sammanfattning

Med den stora ökningen av tillgänlig datorkraft för forskning inom materialvetenskap de senaste ˚aren har en rad nya upptäkter möjliggjorts. Noggranna undersökningar av storskaliga atomsystem medför dock fortfarande extremt höga beräkningskostnader. Medan den senaste utvecklingen av superdatorer accelererade med grafikprocessorer (GPU:er) kan erbjuda en lösning i viss utsträckning, har de mest använda program för beräkning av den elektroniska strukturen ännu inte implementerats fullt ut för att utnyttja denna nya kraft. Med en enormt ökad efterfr˚aga p˚a nya och bättre material fr˚an b˚ade vetenskap och industri m˚aste en effektivare metod för undersökning av materialegenskaper implementeras.

Användningen av maskininlärning (ML) för att skapa Interatomära Potentialer (IP) som kan ¨

overträffa de klassiska potentialerna har ökat kraftigt de senaste ˚aren. Med framg˚angsrik imple-mentering av ML-metoder s˚a som neurala nätverk eller Gaussiska basfunktioner kan noggrannheten för ab-initio-metoder uppn˚as till priset av simuleringar med empiriska potentialer. De flesta ML-metoder kräver dock datamängder med hög noggrannhet för att kunna tränas tillräckligt. Om s˚adan data inte finns tillgängliga för systemet av intresse, kan den enorma kostnaden för att skapa en tillräcklig mängd data snabbt upphäva fördelen med att använda ML.

I detta examensprojekt undersöks därför de elastiska och termodynamiska egenskaperna hos Ti0.5Al0.5N slumpmässig legering vid förhöjd temperatur med hjälp av en Aktiv Inlärnings (AL)

metod med Machine Learning Interatomic Potentials (MLIP) mjukvaran. De erh˚allna materialegen-skaperna stämmer väl överens med resultaten fr˚an kostsamma ab-initio-studier av Ti0.5Al0.5N, till

enbart en br˚akdel av kostnaden. AL-metoden kräver inga datamängder med hög noggrannhet eller tidigare kunskap om systemet, eftersom modellen initialt tränas p˚a data med l˚ag noggrannhet, som sedan tas bort fr˚an träningsdatan (TS) i ett senare skede. Detta möjliggör en iterativ process för att förbättra och utöka den datamängd som används för att träna IP:n, utan behov av storskaliga datamängder.

(5)

Nomenclature

Abbreviations and Acronyms

Abbreviation Meaning

AI Artificial Intelligence

AIMD Ab-Initio Molecular Dynamics

a.u. Arbitrary Unit

CLTE Coefficient of Linear Thermal Expansion DFT Density Functional Theory

DOS Density of States

f.u. Formula Unit

FCC Face-Centered Cubic

IP Interatomic Potential

ML Machine Learning

MLIP Machine Learning Interatomic Potentials

MTP Moment Tensor Potential

RMS Root Mean Square (Error)

SIFC Symmetry Imposed Force Constant SQS Special Quasi-random Structure

TDEP Temperature Dependant Effective Potential

TS Training Set

VASP Vienna Ab-Initio Simulation Package

VS Validation Set

Greek/Latin Symbols

Symbol Name Meaning

a - Lattice Parameter

C - Elastic (Stiffness) Tensor ε Epsilon Strain Tensor

σ Sigma Stress Tensor

µ Mu Mean Value

η Eta (Finite) Distortion

α Alpha Coefficient of linear thermal expansion ∆ Delta Difference wrt. certain base value

(6)

1 Introduction

In recent years, the need for novel materials with exceptional properties such as high hardness, stiffness or elasticity has grown steadily [1]. New materials for high performance hard coatings, used in large scale industrial processes, are thus a thriving field of research [2]. Using molecular dynamics (MD), the high-temperature behaviour of any potential new material can be determined. Yet, to accurately study the intricate chemical and physical processes, Ab-initio Molecular Dynamics (AIMD) simulations need to be run [3]. The extremely high cost of these AIMD simulations has until now been rationalized by the immense value of the potential materials discovered and the rapid development of high-powered supercomputing, but scientists are continuously improving upon the calculational methods used in order to reduce the demand of calculations [4].

The use of Interatomic Potentials (IP) obtained via Machine Learning (ML) has shown great promise to both reduce the demand and retain the accuracy of ab-initio approaches. Gaussian approximate potentials have been implemented with great success in both accurately predicting phase transitions in single element system and lowering the computational demand of simulating enormous systems by a substantial amount [5]. Further development of this method also included the opportunity to obtain information about the electronic structure of those system, even though this approach does not explicitly treat the electrons [6, 7].

Recently, with an implementation of mathematical descriptors known as Moment Tensor Potentials (MTPs), data from high accuracy AIMD simulations has been used to train an IP which can accurately represent the quantum mechanical interactions between the atoms within the system [8, 9]. Potentials trained based on this MTP method can be used similar to empirical potentials in MD simulations, and will produce results which closely mimic the accuracy and outcome of AIMD simulations, at a fraction of the computational cost. In a recent paper, Tasn´adi et al. have shown the validity of this approach, obtaining high accuracy prediction of the elastic moduli of the well known hard coating Ti0.5Al0.5N [10]. These results have shown that the computational cost of high

accuracy predictions can be lowered by up to three orders of magnitude.

Yet, the computational effort required to build the initial data set, which was purely based upon high accuracy AIMD simulations, almost negates the advantage of this method [11]. To eliminate the need for large and expensive data sets, an Active Learning (AL) approach can instead be utilized. The main benefit of AL is that the data set is expanded through an iterative process, in which the algorithm itself chooses the most beneficial data points, thereby achieving higher accuracy with less training. In this diploma work, such an active learning setup is employed via the MLIP software and used to investigate the thermal and elastic properties of Ti0.5Al0.5N random

alloy at elevated temperature, and compare them to earlier results [11, 10]. Focus will lie on effective implementation of the AL approach, rather than the fundamentals of the calculational first principle methods.

Throughout this thesis, the recent approach to machine learn high accuracy AIMD data with the MLIP package will be referred to as the MLIP method, while the proposed implementation of the AL setup will be denoted as the AL approach. Both, however, make use of the MLIP package and its MTP descriptors. The scope of this project can be summarized by the following distinct research questions, around which this project is built:

1. Can the AL setup be successfully used to predict properties of the Ti0.5Al0.5N random alloy

at elevated temperature?

2. Does the AL method offer an efficient approach of investigating the Ti0.5Al0.5N system?

3. How does the AL setup perform with respect to accuracy, compared to alternative theoretical methods?

(8)

2 Elastic constants of Materials

Elastic properties of a material determine how internally stressed it becomes under a certain applied strain. For infinitesimal distortion, or strains, which do not push the material past its breaking point or introduce dislocation or permanent deformation, the relationship between stress and strain can be approximated as linear. Similar to a spring being contracted and expanded, the stress of a continuous material under strain is described by the Hooke’s law, Eq. 1. For 3-dimensional materials, stress and strain are given by second order tensors σ and ε, and their relation thus is defined by a 4th_{order tensor C, called the elastic stiffness tensor. [12]}

σij = 3 X k=1 3 X l=1 Cijklεkl i, j = {1, 2, 3} (1)

With the indices {i, j, k, l} all running from one to three, the elastic stiffness tensor has 81 components in total. Due to inherent symmetry, however, this number is reduced significantly. The following symmetry relations have been shown to hold:

Cijkl= Cjikl= Cijlk (2a)

Cijkl= Cklij (2b)

Eq. 2a shows the so-called minor symmetry relations, which follow intrinsically from the symmetric nature of the second order tensors describing the stress and strain. This reduces the number of independent components in the elastic stiffness tensor to 36, meaning that the 4th_{order tensor now}

can be represented by a 6x6 matrix. The major symmetry relations, Eq. 2b, prevent redundancy in the remaining components, and ensure that the resulting matrix representation of the elastic stiffness tensor is symmetric. This results in only 21 independent components remaining. [13] With the simplification of the 4th order elastic stiffness tensor, both the stress tensor σ and the strain tensor ε can be represented by a lower order tensor as well. For calculations of elastic constants, Voigt -notation is used, which represents both second order tensors by vectors instead:

σ =   σ11σ12σ13 σ12σ22σ23 σ13σ23σ33   V oigt ====⇒          σ11 σ22 σ33 σ23 σ13 σ12          ≡          σ1 σ2 σ3 σ4 σ5 σ6          (3) ε =   ε11ε12ε13 ε12ε22ε23 ε13ε23ε33   V oigt ====⇒          ε11 ε22 ε33 2ε23 2ε13 2ε12          ≡          ε1 ε2 ε3 ε4 ε5 ε6          (4)

The factor two applied to the off-diagonal values of the strain tensor in Eq. 4 ensures the validity of Eq. 1 is preserved even in the Voigt notation, given that the summation over k, l includes off-diagonal values twice. With stress and strain tensors represented in vector form, the Hooke’s

(9)

Law can be expressed as a simple matrix-vector multiplication.

The elastic stiffness tensor can be further simplified by considering the point symmetry in the investigated material. For a highly symmetric structure, the number of independent elastic constants is reduced significantly. In a cubic crystal, for example, the elastic stiffness tensor can be defined by only three independent constants C11, C12 and C44[12]:

Cij =          C11 C12 C12 0 0 0 C12 C11 C12 0 0 0 C12 C12 C11 0 0 0 0 0 0 C44 0 0 0 0 0 0 C44 0 0 0 0 0 0 C44          (5)

To simulate the strain caused in a material by the applied load, the cell of the material can be slightly distorted. To ensure that the approximation of linear elasticity still holds under these conditions, the distortion should be sufficiently small. For a finite distortion of the form

D(η) = I + ε(η) =   1 + η η/2 0 η/2 1 0 0 0 1   (6) applying a strain ε(η) =   η η/2 0 η/2 0 0 0 0 0   V oigt ====⇒          η 0 0 0 0 η          (7)

to a system of cubic symmetry, the full stress tensor can be obtained from calculations with e.g. first principle methods. With the elastic stiffness tensor simplified by the cubic symmetry, the independent elastic constants C11, C12and C44 can be calculated from the following relationships:

         σ1= C11η σ2= C12η σ3= C12η σ6= C44η −→      C11= σ_η1 C12= σ2_2η+σ3 C44= σ_η6 (8)

(10)

3 Free energy of Alloys

Crystallography, the study of materials with long and/or short range order along the atomic lattice, can describe a wide range of atomic systems, but many multi-element materials lack this inherent systematic arrangement of atoms. Such random mixtures containing metallic elements are called alloys, and while the atoms in such systems still are neatly sitting at their lattice positions, the arrangement of atomic species across lattice sites is random. This randomness in alloy systems can pose difficulties during the characterization of the material.

To investigate and compare different systems of atoms, the correct thermodynamic potential needs to be chosen according to the surrounding conditions. This potential can then be used to determine which system is more favorable, or stable. From Density Functional Theory (DFT), the internal energy E of any system of atoms can be determined. DFT is an exact zero temperature approach and the systems are therefore treated as static. In reality, atoms move around their equilibrium positions on the atomic lattice. These movements are highly temperature dependent, yet are present even at zero temperature, so called zero-point motion. [14, 15]

The thermodynamic potential of a system needs to be chosen based on the set of thermodynamic variables describing the system. For a system with constant energy, volume and number of atoms, the system can reach equilibrium by maximizing its entropy. If instead the systems temperature T is kept constant in combination with the number of atoms and system volume, equilibration can be achieved by minimizing its free energy:

F = E − T S (9)

Here, F is the Helmholtz free energy, T the system temperature and S the entropy of the system. This potential takes into account the effect of temperature on a system with constant volume and a fixed number of atoms. While the internal energy can be obtained from static calculations, the entropy of the system can be a challenge to calculate. In general, the contributions to the entropy are three-fold:

S = Se+ Sc+ Svib (10)

The first term is the electronic entropy Seand the second term is the configurational entropy, which

for a binary random alloy is given by [16]:

Sc = −kB[x ln(x) + (1 − x)ln(1 − x)] (11)

where kB is the Boltzmann constant and x denotes the concentration of one atomic species in the

alloy. Which one of the atomic concentrations is chosen to represent x in Eq. 11 does not matter, given that the equation will be the same for a binary alloy. This is a measure of the configurational disorder, or randomness, within the system.

The final contribution to the overall entropy of the system is the vibrational entropy Svib. Stemming

from the vibrational movement of atoms around their lattice positions, this contribution to the systems free energy has great effect on material properties at elevated temperature, such as structural stability, thermal expansion or heat capacity. These quantized lattice vibrations are known to as phonons. Their contribution to the free energy of the system is given by [16]:

(11)

Fvib= X j X q ~ωjq 2 + kBln 1 − e −~ωjq kB T (12)

Here, the summation runs over the phonons vibrational modes j and their Brillouin zone wave vectors q. The phonon frequency is denoted as ωjq, and ~ is the reduced Plancks constant. Eq.

12 does, however, not relate the phonons to the vibrational entropy Svib directly, but rather links

them to the vibrational free energy Fvib= −T Svibinstead.

With the above considerations, the Helmholtz free energy of an alloy at elevated temperature is given by:

F = E − T Sc+ Fvib (13)

where the electronic entropy has been neglected. This measure of free energy allows for determination of the most stable, equilibrium state of any alloy at zero pressure and elevated temperature.

(12)

4 Atomic Simulations

The need to simulate the dynamic motions of atoms in a material arises from properties which static calculations fail to capture. Even though these calculations neglect the atoms motion around their equilibrium position, many interesting material parameters, such as hardness, electronic structure or chemical properties, can be determined without considering the internal motion of atoms. These static models do, however, not capture properties such as thermal expansion, heat capacity or the existence of phase transitions. The study of the dynamical behaviour of atomic systems is called molecular dynamics (MD) in general for any atomic systems which may or may not exhibit long range order or inherent symmetry. [17]

The atomic motions within a material are not random oscillations around the equilibrium positions, but rather reactions to the forces acting between the atoms of the system. While atoms exhibit behaviour which arises from the quantum mechanical nature of the electrons, their motion in the system is determined by purely classical mechanics; the forces acting on each atom cause it to accelerate in accordance with Newton’s laws of motion:

∂2¯rj(t)

∂t2 = −

1 mj

∇φj(¯rj, t) (14)

Here, mj is the mass of the jthatom, ¯rj is its position, and φj(¯rj, t) is the potential which stems

from the atoms interaction with its surrounding atoms. When taking into account only pair-wise interaction between atoms, the potential in which each atom is moving can be approximated as the sum over all individual interatomic interactions throughout the system:

φj(¯rj, t) =

X

i

φij(rij) (15)

where rij =| ¯ri− ¯rj | is the absolute distance between the jth and ith atom. In reality, the

interactions between neighboring atoms are much more complex, and potentials which take into account higher-body interactions can be constructed at the cost of a higher computational demand.

4.1 Potentials in Molecular Dynamics Simulations

In any atomic system, the atoms experience forces based on the potential in which they are moving, in accordance with Eq. 14. Therefore, an accurate description of the potentials acting between the atoms is crucial to describe the physical behaviour of any system. Due to the nature of atoms, positively charged ion surrounded by negatively charged electrons, all interatomic potentials are effectively electrostatic at their root. While the long range Coulomb interaction (∝ r−1) between atoms is the strongest among electrostatic forces, it is not sufficient to describe the physics of a system. Inductive forces, stemming from the change in background electric field due to the movement of atoms, are a contributing factor to the interatomic potentials.

Interatomic potentials include indirect forces as well, such as the dipole-dipole (Van-der-Waals) interactions. These forces arise due to fluctuations in the electronic density around the ions, which may cause the atom to have a spontaneous, temporary dipole. This moment can then interact with surrounding atoms, causing a shift in electronic density around the neighboring ions. Van-der-Waals interactions are short range, and are inversely proportional to the 6thpower of the distance between atoms.

If two atoms are close enough for their respective electron densities to overlap, the electrons of the atoms will start to directly interact with each other. Due to the Pauli exclusion principle stating that two electron with the same quantum numbers cannot occupy the same state [18], electrons are forced into higher energy states. This causes a very strong, but rather short ranged repulsion between the atoms, inversely proportional to the 12th_{power of interatomic distance r.}

While it possible to calculate each of these interactions with great accuracy, it is more efficient to construct an approximate potential with model parameters, which mimics the behaviour of the

(13)

investigated system sufficiently well. For atoms or molecules which are electronically neutral, for example, the so called Lennard-Jones potential is used:

φLJ = 4ε _σ r 12 −σ r 6 (16)

Here, ε is the lowest possible potential energy, and σ is the distance at which the potential energy is zero. Classical potentials like these work surprisingly well if used with the correct model parameters in the right situations. Since the electrons of the system are not treated directly, MD simulations with these potentials are also not as computationally demanding. This, however, comes at the cost of transferability between different systems: A classical potential which describes the interaction between neutral atoms well will completely fail to capture any meaningful physical behaviour in metals or systems with ionic binding. To obtain potentials which cover a wide range of systems, other, more complex methods need to be applied.

4.2 Ab-Initio Molecular Dynamics

A more accurate approach to obtain the forces acting on atoms in the system is the use of ab-inito methods, such as DFT, which is referred to as ab-initio Molecular Dynamics (AIMD). Such methods take into account not only electrostatic forces between atoms, but also explicitly model the quantum mechanical behaviour of the electrons in the system. The AIMD approach, however, does come with an elevated demand for computational power, but this increase in cost is generally worth it, given that AIMD simulations describe the dynamics of almost any system with great accuracy. The general scheme for an AIMD simulation is illustrated in Fig.1.

Figure 1: General scheme for an AIMD setup using DFT to obtain accurate forces.

In AIMD, there is no need to consider the motion of both ions and electrons simultaneously; electrons are much lighter than the heavy cores of the atoms and thus move at remarkably higher speeds. It is therefore possible to assume that during the relatively slow motion of ions, electrons will always find their equilibrium positions instantaneously. This decoupling of electron and ion movement is called the Born-Oppenheimer approximation, and is heavily used in AIMD. For each time step in the simulation, the electronic structure is relaxed to obtain the forces acting on the ions, after which the atoms are moved accordingly. This means that from the electrons point of view, the larger, slower ions are considered stationary while calculating electron wavefunctions. [19] In DFT, the electronic ground state of the system is calculated based on functionals which are dependent of the spatial density of electrons n(¯r). The theory of DFT is based upon a set of fundamental theorems: The Hohenberg-Kohn theorems state that the external potential, and hence also the total energy, is a unique functional of the electron density n(¯r) of the system. Furthermore, they state that the ground state energy can be obtained variationally, given that the electron density n(¯r) that minimizes the total energy is the exact ground state density. [20] While these theorems give a description of the ground state energy of the system using only the electron density, they are exact in nature and do not allow for efficient computation in practice.

(14)

With help of the Kohn-Sham formulations, however, an efficient and practical way for calculating electronic structure is achieved. In the Kohn-Sham approach, the real system is replaced by a fictional system of non-interacting systems. This greatly reduces the complexity of the problem, and makes solving the Schr¨odinger equation viable even for large systems of electrons. The difference between the real and fictitious system is accounted for in the Kohn-Sham formulation by a single term: the exchange-correlation potential. Since the true form of this potential is unknown, it is approximated using either Local Density Approximations (LDA) or Generalized Gradient Approximations (GGA).

This approximation, however, still does not allow for an analytical solution for the systems ground state energy. Instead, a solution needs to be found using a self consistent approach, as visualized in Fig. 2, where the self consistent process starts with an initial guess of the electronic density. This approach is called the Self Consistent Field (SCF) method [21, 22].

Figure 2: The SCF cycle. From a starting electronic density, a Hamiltonian is constructed, which in turn is used to solve for the electronic wavefunctions. From these, a new electronic density is obtained, and the cycle is repeated until convergence.

The above process will be iterated until the electronic density is sufficiently converged. While this cycle can be repeated until the difference between two iterations is infinitesimally small, the inherent error due to the approximation of the exchange-correlation energy will remain. In AIMD, this electronic structure relaxation will take place every single time step, which is the cause of the increased computational cost. Due to treating electrons in the proper quantum mechanical way, however, this results in highly accurate forces, and thus leads to exact physical behaviour in the investigated atomic system, within the error of the approximated exchange-correlation potential.

4.3 Thermodynamic Ensambles and Ergodicity

For MD simulations of a specific system, different information can be obtained based on which thermodynamic variables are kept constant throughout the simulation. The specific setup of quantities kept constant throughout a run is referred to as an Thermodynamic Ensemble. If for example the system volume V , number of atoms N and total system energy E are kept constant, the MD simulation would use the so called microcanonical ensemble. This NVE ensemble is used to accurately study the dynamical processes of a system. [17]

Simulating a system with a NVE ensemble will treat the system as isolated. Since the total energy of the system is kept constant however, any atomic relaxations which change the potential energy U of the system will also cause a change in the kinetic energy T , and thus change the system temperature T . So if the systems behaviour at a certain temperature is of interest, the system can be connected to a heat reservoir which is kept at a constant temperature. This temperature preserving canonical ensemble, or NVT ensemble, does no longer preserve the total energy in the system, due to heat exchange with the thermal reservoir. [17]

(15)

When allowing for volume relaxation in the system during a MD simulation, the isobaric-isothermal (NpT) ensemble is used, referred to as Gibbs ensemble. This ensemble can be useful to determine the expansion of a material with increasing temperature. While these different ensembles will try to keep the chosen thermodynamic quantities constant, the error in the calculations methods will undoubtedly cause small fluctuations around the specified values. If an MD simulations is conducted for long enough, even the remaining thermodynamic variables will start oscillating around a certain value and no longer shift, at which point the system has reached equilibrium. While it might seem to be correct to obtain information about the system by averaging over the time steps of the system at equilibrium, this is not entirely true. Statistical mechanics shows that any such property of a system should be extracted from the ensemble average, i.e. averaging over a large number of copies of the investigated system, instead. Thankfully, the works of Birkhoff and Neumann have proven that the time average can be used for ergodic systems [23, 24]. Ergodicity of a dynamical system means that, over a long enough time, the system will move and evolve enough to eventually cover all parts of the phase space in which it exists. This means that for long enough MD simulation of an equilibrated system, quantities extracted from the time average will converge to values obtained via ensemble averaging.

(16)

5 Machine Learning

Machine Learning (ML), a branch of the vast field of Artificial Intelligence (AI), is the study of a computers, or a machines, ability to learn how to solve tasks for which it has not been formally programmed. While the term was coined in the 1960’s, the full potential of ML methods has only been realized in recent decades, due to the rapid advancement in computational power, better ML algorithms, and the vast databases built for learning and pattern recognition. Tasks which once may have seemed unthinkable to achieve are now part of daily life, such as accurate speech recognition, self-driving vehicles and advanced algorithms which can beat any human player at even the most complex of games [25, 26, 27].

While most ML based approaches have gained in popularity only over the recent decade, the use of ML based methods in scientific context is much older. Searching extensive sets of data for reliable patterns which describe the laws of nature, for example, by far predates the founding of ML as a field of research. Already in the 16th_{century, Tycho Brahe extracted empirical laws of planetary}

motion from the data set of observations made by Johannes Kepler, thus being one of the oldest well documented cases of pattern recognition. While the computational power available nowadays allows to solve such tasks in mere seconds, the mathematical concepts behind some of these old methods have barely changed.

The rapid development of ML approaches has had great impact on many scientific fields in recent years. Fine tuning of experimental parameters can be automated using a Bayesian approach, and accurate prediction of material parameters and the evolution of atomic systems can be achieved at a fraction of the cost required by conventional methods [28, 5]. In this section, the basics of ML are outlined, and the implementation of the MTP based MLIP approach in material science is discussed in detail.

5.1 The basics of Machine Learning

To start any ML process, two things are required: A set of data on which to learn, and a cost function, which defines how great the penalty for errors will be during the learning process. Furthermore, the type of the trained model should be chosen beforehand, whether it should yield a continuous output (regression), or characterize data into predefined classes (classification). The contents of this section are based on methods by C. Bishop [29].

For the training process, the data is split into two parts: a training set (TS) and a validation set (VS). This ensures the models ability to generalize the data without memorizing. If a model is too complex for the data on which it is trained, it will memorize the data points in the TS, resulting in a non-existent error when tested on the training data. When presented with new data points, however, these over-fitted models tend to fail miserably, since they have little to no ability to generalize new, uncorrelated data. When using only part of the full data set for training, while testing errors on the remainder of data, such over-fitting can be avoided. Therefore, the data set used to obtain the ML model should be large enough to handle being split in at least equal parts training and validation data. However, smart algorithms, such as cross-validation, can eliminate the need for extensive data sets, and save valuable time and resources in cases when the gathering of data is inconvenient or highly demanding.

Below, a simple case of polynomial regression is presented to explain the steps and methods of ML in more detail. Firstly, two-dimensional data is created from 20 points in the interval x ∈ [0, 6] and their respective y-values y = sin(x), shown in Fig. 3 a). To simulate real data, Gaussian noise is added in the y-dimension, and the data set is randomly split into equal parts training and validation data, see Fig 3 b).

(17)

Figure 3: Visualization of the original data (a) and training and validation data sets with noise (b).

5.1.1 Training and Validation

For polynomial regression the best model for the given data is assumed to be best described by an ansatz of the type:

y(x, ¯w) = w0+ D

X

i=1

wixi (17)

where D is the polynomial degree, and ¯w = {w0, ..., wD}T is a vector containing the polynomial

coefficients, or weights. Before the model can be trained on the TS, a cost function needs to be constructed. This function will determine the cost, i.e. some sort of error, of the trained model. For this case of polynomial regression, the used cost functions is the Root-Mean-Square (RMS) error function, defined by:

fcost(x, ¯w) = v u u t 1 N N X i=1 (ypred(x, ¯w) − y)2 (18)

Here, ypred is the predicted value based on the trained ML model. The training itself consists

in finding the set of weights ¯w = {w0, ..., wD}T that minimizes this cost function. In the case of

polynomial regression with a RMS based cost function, the solution to this optimization can be found analytically, according to:

¯

w = (XTX)−1XTy¯ (19)

where X ∈ RN ×D _{is the matrix containing observations of the N independent variables in the}

following form: X =      1 x1 . . . xD1 1 x2 . . . xD2 .. . ... . .. ... 1 xN . . . xDN      (20)

The above mentioned method is known as the least-squares methods of fitting a polynomial function to data. Training polynomials of different degree on the training data shown in Fig 3 b), a multitude of different models is obtained, some of which are shown in Fig 4.

(18)

Figure 4: Different polynomials fit to the training data based on minimizing the RMS cost function. Subfigures a), b) c) and d) show results from polynomial functions of degree D = 1, 3, 6 and 8, respectively.

Directly from Fig. 4, the conclusion can be made that some models, such as a) and d), are less successful at capturing the trend in the data, while others, such as b) and c), show promising results. During training of ML models, a wide range of different functions are trained, so that the best approach can be chosen based on model performance. However, to simply chose a model based on the error during the training step would be inappropriate, as can be seen in Fig. 4 d). Here, the training error is far lower then for any other models in Fig. 4, but the ML model completely fails to capture the trend of the data. This type of over-fitting and memo-rization of data can be avoided by validating each model on a separate, uncorrelated data set, the VS. During validation of any ML model, predictions are made for the VS, and errors are calculated according to the loss function. In the case of the simple RMS loss function defined by Eq. 18, the errors for both training and validation are displayed in Fig. 5.

(19)

Figure 5: RMS errors for both training and validation data, based on polynomial regression with different degrees.

While the RMS error for the TS is strictly decreasing with respect to the polynomial degree of the model, the error calculated on predictions made for the VS has a sharp increase for polynomial degrees higher than 6. It is at this stage that the trained model starts to memorize the exact data, instead of finding general trends. This issue of finding the right model complexity to minimize errors, while also maximizing the ability to generalize data, is known as the bias-variance trade-off. 5.1.2 The Bias-Variance Trade-Off

The bias-variance trade-off is an optimization problem in which both the bias and variance errors of an ML model are sought to be minimized. This, however, turns out to be a difficult task, given that bias and variance are inherently connected and problematic to decrease simultane-ously. The bias error of an ML model can be described as the nature of the assumptions made during creation of the model. The more extensive the assumptions are, the higher the bias of the model will be. In the aforementioned example, the major assumption was that sinusoidal data could be well described by a polynomial function, something which according to Taylor’s theorem only holds for a polynomial of infinite degree and also requires data without noise. High bias leads to low variance, as seen in Fig. 4a). Here, the assumption of linearity causes the ML model to miss the general trend of the data due to high bias, but displays exception-ally low variance. Even if the data set would be extended, or perhaps even completely changed, the model would remain approximately the same negatively sloped function with a positive intercept. The variance error, on the other hand, is the ML models sensitivity to minor changes or fluctuations in the data set. In Fig. 4d), the high variance comes from the increase in model complexity, and causes the ML model to almost completely learn the TS, while failing to capture data trends during validation. Models with high variance learn the noise within the data, rather than the trends which would make the model general. This also leads to a high sensitivity with respect to the data set; extending or changing the data set will also cause a great change in the model parameters. Therefore, it is clear that neither a high bias nor a high variance is desirable. A reliable ML model always should have carefully balanced bias and variance errors, to be able to capture the general trend of that data, without being too sensitive to noise or changes in the TS.

To demonstrate the above point, the example with polynomial regression will be used once again. Reducing the high bias of the low order polynomial can easily be achieved by adding more parameters, i.e. increasing the polynomial order. To reduce the variance of the high order

(20)

polynomials, however, is a more challenging task. One simple approach is to increase the number of points in the data set, as visualized in Fig 6a). Here, the number of points in the data set was doubled for the polynomial regression model of eighths order, which increased the model bias and reduced the variance, mending the oscillatory behaviour of the model.

Figure 6: ML models of eighth degree with increased bias by a) a larger number of points in the data set, b) ridge regression with parameter λ = 0.1.

Another approach is to penalize large polynomial coefficients, or weights. Since rapidly oscillating polynomial functions have extensively large coefficients, these values can be used to suppress unwanted behaviour in the ML model. This is called ridge regression, and redefines the cost function in the following way:

fcost(x, ¯w) = v u u t 1 2N N X i=1 (ypred(x, ¯w) − y)2+ λ 2 || ¯w || 2 ₍₂₁₎

This will penalize each coefficient equally, and large weights will increase the cost of the ML model. The result of this method is shown in Fig. 6b) for λ = 0.1. The resulting model has higher bias and lower variance, and captures the overall trend in the data far better than before. To control the bias and variance of any ML model is an important part to ensure the ability to generalize data without having high sensitivity to either noise in data or assumptions made during model creation.

5.2 Machine Learning of Interatomic Potentials

Approximate potentials in MD simulations are a widely used tool to decrease the computational demand of otherwise costly calculations, but do not come without a drawback. A certain approx-imate potential is tailored to a specific situation or system, and cannot be easily transferred to other cases. These IPs are used to predict the potential energy of atomic interaction in the system, along with the forces acting between atoms. With ML, such potentials can be constructed based on a set of atomic configurations for which all the important information, i.e. energies, forces and stresses, is known.

One of the early successful implementation of ML based IPs was a generalized neural network approach to model the potential energy surface in bulk silicon [30]. The resulting IP was not only well in agreement with DFT calculations, but due to linear scaling with system size could be used to predict properties of large systems efficiently. Another successfully implemented approach is the use of Gaussian approximation potentials, which were used to accurately model the potential energy surface of single element bulk phases [5]. This approach has shown exceptional scaling capabilities, as recent work showed that these potentials can be used to efficiently model structural transitions in systems with a size of up to 100 000 atoms [6].

(21)

Similarly, the Machine Learning Interatomic Potentials (MLIP) package enables the creation of IPs based on the systematically improvable Moment Tensor Potentials (MTP), rather than the use of Gaussians or convoluted neural networks. While the viability of this approach has undoubtedly been proven [10], the range of applications in which MLIP might improve overall efficiency has not yet been fully explored.

5.2.1 Moment Tensor Potentials

The Moment Tensor Potentials (MTP) used in the MLIP package are potentials which are based on moment tensor descriptors of the atomic environment in the system. The description of them and the theory behind them has based on the work of Novikov et al. and Shapeev [9, 8]. For a given configuration of atoms, the total energy can be calculated by summing over the individual contributions of each atom and its local environment:

EM T P = N

X

i=1

VIP(ni) (22)

Here, ni denotes the local environment of the ith atom, and the summation extends over all N

atoms in the system. The interatomic potential VIP(ni) can in turn be expanded as follows:

VIP(ni) =

X

α

ξαBα(ni) (23)

For a given set of basis functions Bα, the coefficients ¯ξ = {ξ1, ..., ξα} of this linear expansion can be

found by fitting them to the atomic configurations in the TS.

The basis functions used in MTPs are constructed from the moment tensor descriptors, which form the core of the MLIP software. The moment tensor descriptors, or moments, are invariant with respect to permutation, rotation or reflection, and are defined in the following way:

Mµ,ν(ni) =

X

j

fµ(|rij|, zi, zj)¯r⊗νij (24)

Here, the index j runs over all adjacent atoms within the cutoff radius Rcut, and zi, zj denote

the atomic type of the respective atom. The moments are built on a radial and an angular part. The angular part, ¯r⊗ν_ij , is simply the outer product of the vector connecting the ith and jth atom, applied ν times. This results in a rank ν tensor containing angular information about the atomic neighborhood of the ith atom. The radial part of the moment is defined as follows:

fµ(|rij|, zi, zj) = X β cβµ,zi,zjQ β (|rij|) (25) Qβ_(|r

ij|) are the radial basis functions, and are within the cutoff radius Rcut defined as:

Qβ(|rij|) = pβ(|rij|)(Rcut− |rij|)2 (26)

Outside of the cutoff radius Rcut, these radial basis functions are zero, meaning that no interaction

occurs with atoms beyond the cutoff sphere. pβ_(|r

ij|) are polynomial functions between the minimal

atomic distance Rmin and Rcut.

In order to confine the set of chosen basis function Bα for a given MTP, the level of moments is

defined:

lev(Mµ,ν) = 2 + 4µ + ν (27)

The level of any contraction or product between moments is additive, meaning that the resulting level is simply the sum of the levels of the contracted moments. For example, lev(M0,1) = 4 and

lev(M2,1) = 11, and thus the level of their contraction is lev(M0,1· M2,1) = 15. To obtain a certain

(22)

to lev(Bα) ≤ levmax. This means that any moments, or even products or contractions of moments

which produce a scalar output, with a level lower or equal to levmax are included in the set of basis

functions.

For the MTPs used in this project, both the 10g (levmax= 10) and 16g (levmax= 16) MTPs were

used at different stages during training.

5.3 Active learning

Conventional ML is viewed as a form of passive learning, where the algorithm, or ML model, is repeatedly fed data until some convergence criteria is achieved. The model itself can not decide on whether to ignore certain points, or even ask for new data in certain regions of uncertainty. Active Learning (AL) fixes this by introducing the model to an unlabelled data set, from which the model can choose to query points it deems worth investigating. The true strength of this approach is clear, given that any learner, whether it is an algorithm or a biological being, strongly benefits from the ability to explore, ask questions and gather new data during the learning process. [31, 32] To fully utilize the ability to extend and improve the available data set, the AL model needs to determine the uncertainty in the prediction it makes, and have access to a so called oracle, or teacher, which will accurately label the newly chosen data points. The general idea of an AL setup is illustrated in Fig. 7.

Figure 7: The active learning setup. The pre-trained, initial model is improved upon iteratively by expansion of the data set from which it learns. The oracle, marked in read, is responsible for providing the true label of data which may improve the performance of the model.

The iterative process of AL starts with a pretrained model which is either trained on low-accuracy data or a data set which is too small in size to ensure the models ability to fully generalize the trend in the data. In some cases, this model can also be randomly initialized, or simply left completely blank, which eliminates the need for any initial training data. This model is then used to make predictions on unlabelled data on which it has not yet been trained. In this step, the model needs to have a way to determine either the estimated errors in its prediction, or be able to determine the extrapolation grade, i.e. how far out of its comfort zone it has stepped. These values can then be used to sort the data on which the predictions were made.

(23)

expansion of the TS. A small subset of these data points are then chosen and a query is made to the oracle, which will provide the true labels for the data. After expanding the TS with the newly labelled data, the model can be retrained, and will have improved performance when making prediction close to the points which have been added to the TS. By repeating this process of labelling high error data points and expanding the TS, the parameter space can be scanned for the type of data the model is supposed to predict. Ultimately, the model will no longer have extrapolation or high error in its prediction, and will have gained the ability to accurately reproduce trends in the data. In the AL approach which is discussed in this project, the MLIP software creates a potential which should accurately model energies, forces and stresses of a given atomic configuration. The unlabelled data points are created by letting the atomic system evolve according to Newton’s laws of motion, as discussed in Sec. 3. Each time the MLIP software makes a prediction, the extrapolation grade of that prediction is calculated using the so called MaxVol algorithm, thereby enabling effortless selection of viable data points for expansion of the TS.

5.3.1 The MaxVol algorithm

In the active learning setup of the MLIP software, the MaxVol algorithm is used to determine the extrapolation grade of any processed configuration. It does so by calculating the change in the determinant, or volume, of the matrix containing the fitted coefficients of the data in the TS. However, given a rectangular matrix A ∈ Rn×m_{, m 6= n, the definition of a determinant det(A) is}

non existing. Nevertheless, by creation of a well-conditioned square submatrix within A itself, the determinant can be calculated. To ensure the quality of the chosen submatrix, its volume should be maximized over the full set of possible submatrices within A itself.

To achieve this in a computationally effective way, the MaxVol algorithm can be used. Start-ing with the rectangular matrix A ∈ Rn×m_{, the initial submatrix A}

sub∈ Rm×mis defined as follows:

A =   Asub C   (28)

Here, C is the lower submatrix of A, and has a size of (n − m) × n. A new matrix B is created by matrix multiplication of A with the inverse of Asub:

B = A · A−1_sub=   Im×m Z   (29)

By definition, the upper m × m part of B is unity. To improve upon the chosen submatrix and maximize its volume (determinant), the lower submatrix Z can be searched for values |bij| > 1.

Upon finding a value for which this condition holds, the ith_{and j}th _{row of A are switched, and the}

process is repeated. While this can be continued until the submatrix corresponding to the true maximum volume is found, it may be wise to incorporate a stopping criterion once a sufficiently good submatrix has been found. Stopping once any |bij| > 1 + δ, with δ ∼ 10−2, will save

substan-tial computational resources, especially for large matrices, while still finding a satisfactory submatrix. With each configuration in the training set accounting for a linear equation connecting the basis functions to the configurational energy, an over determined set of linear equations is formed. This set of equations can be written in the form:

Ecf g= B ∗ Θ (30)

where Ecf g _{and Θ are vectors, and B ∈ R}n×m is a tall matrix containing the fitted coefficients obtained from each of the n configurations. Using the MaxVol algorithm, the volume of this tall

(24)

functions are fitted to its energy, and a row vector containing these coefficients is added to the matrix B. Calculating the new volume of B relative to the old volume before adding this new configuration will yield the degree of extrapolation.

An increase in matrix volume indicates that extrapolation is taking place when estimating the configurations energy landscape, while a decrease in matrix volume will be evidence of interpolation between already known configurations. While classical machine learning treats any sort of extrap-olation as highly unreliable, the MTPs used in the MLIP software are fairly robust, and ”safe” extrapolation can be achieved for relatively high extrapolation grades.

(25)

6 Calculational Details

In the following section, the calculational details which led to the results obtained during this project are explained. Work flows and calculational setups for all important steps along the way are illustrated and/or described in great detail.

Calculations described in Sec. 6.3 were carried out on the Beskow supercomputer at the PDC Center for High Performance Computing at the KTH Royal Institute of Technology. The remaining calculations were carried out on the Tetralith supercomputer at the National Supercomputer Centre (NSC) at Link¨oping University.

6.1 List of Programs/Packages

Vienna Ab-Initio Simulation Package (VASP). VASP is a licenced and well-established software developed at the University of Vienna, using projector augmented waves for DFT based electronic structure calculations. The methods implemented in the VASP code were used for calculations throughout this project. [33]

For any static calculations involved in the AL process for the Ti0.5Al0.5N structure with VASP,

the same setup was used throughout the thesis. For these calculations of the 128 atom structure, a 5 × 5 × 5 k-mesh was used, and the energy cutoff was set to 600 eV. Perdew-Burke-Ernzerhof (PBE) potentials were used in combination with blocked Davidson optimization and a con-vergence criterion of EDif f = 10−5 [34]. Similarly, Γ-point AIMD simulation were carried out

using VASP with a comparable setup, only lowering the minimum number of electronic steps to four. Machine Learning Interatomic Potentials (MLIP). The MLIP package is a scientific software for the machine learning of IPs based on MTP descriptors. This software was developed by the group of Alexander Shapeev at Skolkovo institute of science and technology (Skoltech). Note that a newer version of this software, MLIP 2, was recently released [9]. For this project, the older version of MLIP was used. [35, 8]

Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS). The LAMMPS software is an open-source classical MD code with focus on materials modelling. In this project, both the LAMMPS code itself and the freely available Python API were used. [36, 37]

Phonopy. The Phonopy Python package is used for phonon calculations at harmonic and quasi harmonic levels. In this project, Phonopy is used together with other python packages to calculate the phonons of Ti0.5Al0.5N. [38]

PhonoLAMMPS. The phonoLAMMPS package for Python is the LAMMPS interface for Phonopy. It is used to calculate the harmonic phonon properties of Ti0.5Al0.5N from MD simulations.

DynaPhoPy. The DynaPhoPy package uses LAMMPS MD trajectories to phonon linewidths, frequency shifts and anharmonic effects. The implementation of the velocity autocorrelation function also allows for calculation of the phonon DOS. [39]

6.2 Structural information

The structure used to model the properties of the Ti0.5Al0.5N random alloy was generated using the

Special Quasirandom Structure (SQS) method, and is visualized in 8 [40]. This ensures that the physical behaviour of the chosen structure mimics the average behaviour of a truly random alloy.

(26)

Figure 8: Crystal structure of the cubic 128 atom super cell used to start the AL process and calculate elastic and thermodynamic properties. The structure was created using the SQS method.

Using the Alloy Theoretic Automated Toolkit (ATAT) code, the Short Range Order (SRO) param-eters of the Ti0.5Al0.5N structure were calculated for the first eight shells, see Tab. 1. Values of 1.0

depict total order in the shell, while values of 0.0 stand for random distribution of atoms in the shell. The clustering of atoms is described by negative values of the SRO parameter.

Table 1: SRO parameters of the Ti0.5Al0.5N structure, calculated using the ATAT code.

Shell 1 2 3 4 5 6 7 8

SRO -0.12500 0.16667 0.04167 0.0000 -0.12500 0.00000 0.04167 1.00000

The SRO parameters of this SQS show that the ordering of atoms in these first shells is not entirely random, and better SRO parameters could be achieved for a SQS of higher quality. However, the structure used in this project is the same as in the works by Tidholm and Shulumba [11, 41]. To compare the proposed AL method for investigation of Ti0.5Al0.5N to these previous approaches in

a consistent way, the same SQS was used in this project.

6.3 Inital Data Set

During the active learning process, a trained potential is improved upon by repeatedly adding more relevant data to the TS used for training. For this project, the initial TS is created based on Γ-point (low accuracy) AIMD simulations, run for Ti0.5Al0.5N at all desired temperatures. Using

the Vienna Ab-initio Simulation Package (VASP), Γ-point AIMD simulations are run following the work flow as illustrated in Fig. 9.

The lattice parameter of the unit cell is determined by linear interpolation between the 0 K lattice parameters of TiN and AlN. By accounting for thermal expansion, the equilibration time for the AIMD simulations can be shortened significantly. Once again, interpolation between the coefficients of linear thermal expansion (CLTE) for TiN and AlN is used to obtain an estimate of the CLTE for Ti0.5Al0.5N [42, 43].

(27)

Figure 9: Workflow for creation of the initial data sets for training and validation. Lattice parameters of the bcc phases are denoted by a, while α denotes the coefficients of linear thermal expansion (CLTE).

After expanding the structure according to the CLTE and the respective temperature, a short 1 000 time step NVT AIMD simulation is run. This is to obtain reasonable atomic positions and velocities, as starting from a perfect crystal with an NpT ensemble can lead to long equilibration times. Using these values as a starting point for an NpT AIMD simulation, the system will reach equilibrium within 1 ps, or about 2 000 time steps.

To construct the data set used to initialize the active learning setup, the configurational data is extracted from each of the finished NpT AIMD simulations. Each time step corresponds to a single data point, which contains the atomic positions and forces, system lattice vectors, system energy and stresses. With over 7 000 time steps in each simulation, the initial data set thus has a size of over 35 000 data points.

This initial data set, however, must first be split into subsets for training (TS) and validation (VS). This is done following the illustration is Fig. 10. To avoid capturing unphysical

be-haviour, the first 2 000 data points (1 ps) of each simulation are discarded. This ensures that no data points containing information about the non-equilibrium system are used for training. The next 2 000 points are added to the initial training set, which thus contains 10 000 data points in total. The next 1 000 data points are once again discarded, in order to avoid correlation between the data points used for training and validation. Data from any time steps remaining after that are used to

(28)

Figure 10: Schematic of how the full data set of configurations for each temperature is split into training and validation set.

6.4 Selection and Initial Training

After obtaining a relatively large initial data set to train on, the initial training of the IP can begin. This early stage training already utilizes some active learning tools, and is illustrated in Fig. 11. From the full TS, containing 10 000 data points in total, 100 points are selected at random. Using this subset of the full TS, three separate potentials are trained based on a low order 10g MTP. It is beneficial to train multiple potentials, since each will be randomly initialized during the training process, which can lead to different results for the same TS.

After training, the errors of the three potentials are evaluated using the full VS. A large data set does not greatly affect the speed of validation, given that it is a far less complex operation than the train-ing of the potentials. The best potential, based on the error of choice, is then chosen to conduct an active selection of new data points from the full TS. During this active selection, the MLIP software chooses to add data points which it deems to be most beneficial for the training of the next potential. The steps of training, validation and active selection are then repeated, until a TS of sufficient size has been achieved to start the AL process in connection to an oracle. For this project, 50 new data points were added each iteration, until a TS of 200 data points was achieved. It should also be noted that in order to increase the initial capability of the trained IP, the starting configuration for the LAMMPS simulations should be added to the TS. This will vastly reduce the extrapolation grade of early attempts at capturing the systems evolution, at the mere cost of a single VASP calculation.

(29)

Figure 11: Workflow for creation of the initial training set used to train the first IP used for the AL process.

For this project, a random configuration from the 900 K ”buffer” was chosen as the starting configuration for all temperatures. After a single VASP calculation, this configuration was added to the TS to ensure that the extrapolation grade of the starting configuration was below the breaking threshold.

6.5 Active Learning Workflow

The vast majority of calculations needed to obtain a sufficiently trained potential comes from the actual active learning workflow, which is illustrated in detail in Fig. 12. Each iteration starts by using the previously trained IP to run MD simulations using LAMMPS. For each temperature, 16 simulations are started in parallel, each of which uses a different seed to randomly initialize the atomic velocities. The length of each time step for these simulations is 1 fs, with a maximum amount of 20 000 time steps.

(30)

Figure 12: Workflow of a single iteration of the AL process.

During these runs, the MLIP software closely monitors every new configuration obtained after each time step, and calculates the respective extrapolation grade. If a new configuration has an extrapolation grade above the selection threshold, it is added to the set of potential new configurations to learn from. In the case that the extrapolation grade is too large, i.e. above the breaking threshold, the LAMMPS simulation is stopped. The resulting set of collected data points is rather large, containing thousands of points, but as the potential becomes more trained, the overall extrapolation grade drops, and fewer configurations are selected. Fig. 13 depicts the selection and breaking threshold used for this project, where any extrapolation grades below the selection limit are treated as highly reliable predictions, and configurations in between the two limits are considered as interesting candidates for increasing the training set. Thresholds were chosen based on recommendations in work by Novikov et al. [9].

(31)

Figure 13: Selection rules based on the extrapolation grades obtained via the MaxVol algorithm. Only configurations between the selection and breaking thresholds are considered as valid candidates

for expansion of the TS. [9]

Next, a subset of the actively selected data points is randomly chosen to expand the TS. For this project, the selection limit was set to a maximum of 10 new configurations in each iteration. Each of the selected configurations is then converted into POSCAR structure files, and static VASP calculations are carried out for each. The used setup results in accurate forces, energies and stresses for each of the new configurations, which then are added to the TS. After this, the IP is retrained on the newly expanded TS to complete the iteration.

The iterative AL process is continued until the active collection and selection processes no longer find any more configurations of interest. Improving the potential beyond that point can be done by either changing to a higher order MTP and repeating the active learning process, or by changing the selection threshold to obtain a denser grid of data points in the TS. For this project, 2 different MTPs were used throughout the active learning procedure. The used potentials were the 10g and 16g MTPs, each with increasing complexity and number of parameters.

6.5.1 10g Potential

The 10g potential, with its 232 parameters in total, is the lowest complexity MTP used for this project. Trained on the initial TS consisting mostly of low accuracy AIMD data points, this IP is sufficient enough to provide a basis from which to improve and evolve the potential. With the selection and breaking thresholds set as described above, the AL process for this potential should finish in 20 iterations or less.

6.5.2 16g Potential

The 16g potential, trained on the TS from the fully trained 10g MTP, will require multiple iterations of the active learning process before achieving full completion of a single MD run without breaking. This potential has 380 parameters in total, and thus requires more data to be sufficiently trained. While the active learning for this potential may be completed withing 20 iterations, the removal of low accuracy data from the TS will cause this process to be prolonged significantly.

The initial 200 data points of low accuracy AIMD configurations need to be removed in order to ensure the accuracy of the final trained IP. Removing all 200 points at once will cause the IP to immediately break after the subsequent training step, and the data should thus be removed in batches. This will allow the potential to ”repair” the parts of the TS which were removed, and replace them with high accuracy data points. For this project, the low accuracy data was removed in two batches of 100 points at a time, with sufficient active learning iterations run between the removals to allow for replacement of the lost data.

Tests run for the final 16g potential yielded results with sufficient accuracy, and no higher order potential was trained. Further continuation of the active learning workflow, with the 20g potential, would have given diminishing returns for the amount of computational resources used. Analysis of the variance of the final TS used for training of the 16g potential confirmed that the number of iterations was sufficient to capture the physical behaviour of Ti0.50Al0.50N in the investigated

(32)

6.6 Calculations of Elastic Constants

The elastic constants of Ti0.5Al0.5N were calculated according to the theory of linear elasticity,

as described in Sec. 2, using the final 16g potential. The FCC crystal structure has an inherent three-fold rotational symmetry along the [111] axis, but due to the randomness in the SQS structure used in this project, that symmetry is broken. The elastic stiffness tensor for the perfect cubic system given in Sec. 2 thus no longer has only three independent elastic constants. To obtain the true elastic properties of the Ti0.5Al0.5N random alloy, calculations would have to be carried out

for a massive supercell.

Fortunately, results by Tasn´adi et al. have shown that the arithmetic average of the elastic constants will converge to the true values of Cij, since the material exhibits strong cubic symmetry despite the

randomness of Titanium and Aluminum atoms [44]. For that reason, elastic constant calculations will be carried out for all three symmetry rotations φ[111]of the FCC lattice, as illustrated in Fig

14, and the results averaged. This will ensure a more accurate prediction of the elastic constants of the Ti0.5Al0.5N random alloy.

Figure 14: Visualization of the rotations along the [111] direction.

The process of extracting the elastic constants of Ti0.5Al0.5N is visualized in Fig. 16. Using the fully

trained 16g potential, a LAMMPS NpT simulation is run for a certain rotation and temperature, from which the equilibrium volume is obtained. Using this, the structure is then distorted, or strained, according to Eq. 6, for five different values of η = −0.02, ..., 0.02. The nature of these distortions is illustrated in Fig. 15.

Figure 15: Illustration of the finite distortions η applied to the crystal structure (Note: Distortions in figure are immensely exaggerated (η = 0.1) for the purpose of visualization). The applied distortions do not conserve the volume of the crystal cell.

(33)

Figure 16: Workflow of calculations for extraction of the elastic constants of Ti0.5Al0.5N. For a given trained potential, this set of simulations has to be run for each temperature T and rotation φ[111]. The obtained Cij values are averaged over the different rotations.

By running NVT simulations for the distorted structure, the average stress tensor for each distortion can be calculated. Before averaging the stresses, the 5 000 first time steps are removed, to make sure that data of the system at equilibrium is used for calculation of the elastic constants. Using the relationship between stress and strain via the Hooke’s Law, the elastic constants can be obtained by differentiation. For this purpose, the five point midpoint formula is used, which approximately differentiates a function according to:

f0(x0) =

1

12h[f (x0− 2h) − 8f (x0− h) + 8f (x0+ h) − f (x0+ 2h)] (31) Here, h denotes a small deviation from the midpoint x0. Using Eq. 31 to differentiate stress as a

function of strain (distortion), all elastic constants can be determined. For a given trained potential, this process needs to be repeated for each temperature at every rotation. To fully extract the elastic properties of Ti0.5Al0.5N thus requires to run the workflow illustrated in Fig. 16 15 times. These

calculations, however, are cheap to run, since the selection process is no longer taking place when using LAMMPS with the trained 16g potential. The decreased demand for computational power is large enough to allow these calculations to be carried out without access to a supercomputer.

6.7 Calculations of Thermal Properties

Calculations of thermal properties of Ti0.5Al0.5N were carried out according to the workflow

illustrated in Fig 17. The process starts again by running a NpT simulation with the final IP in LAMMPS to obtain the equilibrium system volume at all temperatures. Based on this volume, long NVT simulations are run for 200 000 time steps. During these simulations, the atomic velocities of each atom are saved for every single time step.

(34)

Figure 17: Calculations workflow to obtain information about thermal properties of Ti0.5Al0.5N.a

The extracted velocities are used to obtain information about the phonons in the system via the velocity autocorrelation function implemented in the DynaPhoPy package and the PhonoLAMMPS package[39]. To avoid convergence issues and imaginary frequencies, however, it is crucial to ensure that the velocities used for this are taken from an equilibrated system. For this reason, the first 20 000 time steps of the NVT simulation are removed. Directly from DynaPhoPy cal-culations, the phonon DOS and vibrational free energy of the system is obtained at each temperature. While the LAMMPS simulations with the final potential are not very computationally demanding, as mentioned in the previous section, the extraction of thermal properties requires access to some computational power. The calculations for the velocity autocorrelation function implemented in DynaPhoPy are quickly increasing in computational cost with the number of time steps used. However, a high number of time steps should be included in those calculations, to avoid convergence issues.

Active learning of interatomic potentials to investigate thermodynamic and elastic properties of Ti0.5Al0.5N at elevated temperature