Inverse Uncertainty Quantification using deterministic sampling: An intercomparison between different IUQ methods

(1)

UPTEC F 21044

Examensarbete 30 hp Juni 2021

Inverse Uncertainty Quantification using deterministic sampling

An intercomparison between different IUQ methods

Hjalmar Andersson

(2)

Teknisk- naturvetenskaplig fakultet UTH-enheten

Besöksadress:

Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0

Postadress:

Box 536 751 21 Uppsala

Telefon:

018 – 471 30 03

Telefax:

018 – 471 30 00

Hemsida:

http://www.teknat.uu.se/student

Abstract

Inverse Uncertainty Quantification using deterministic sampling

Hjalmar Andersson

In this thesis, two novel methods for Inverse Uncertainty Quantification are benchmarked against the more established methods of Monte Carlo sampling of output parameters(MC) and Maximum Likelihood Estimation (MLE). Inverse Uncertainty Quantification (IUQ) is the process of how to best estimate the values of the input parameters in a simulation, and the

uncertainty of said estimation, given a measurement of the output parameters.

The two new methods are Deterministic Sampling (DS) and Weight Fixing (WF). Deterministic sampling uses a set of sampled points such that the set of points has the same statistic as the output.

For each point, the corresponding point of the input is found to be able to calculate the statistics of the input. Weight fixing uses random samples from the rough region around the input to create a linear problem that involves finding the right weights so that the output has the right statistic.

The benchmarking between the four methods shows that both DS and WF are comparably accurate to both MC and MLE in most cases tested in this thesis. It was also found that both DS and WF uses

approximately the same amount of function calls as MLE and all three methods use a lot fewer function calls to the simulation than MC. It was discovered that WF is not always able to find a solution. This is probably because the methods used for WF are not the optimal method for what they are supposed to do. Finding more optimal methods for WF is something that could be investigated further.

Ämnesgranskare: Henrik Sjöstrand Handledare: Arne Sahlberg

(3)

Populärvetenskaplig sammanfattning

Simuleringar har länge varit en viktig del av dagens teknik och vetenskap. Inom kärn- teknik är detta extra viktigt eftersom riktiga experiment ofta är för dyra eller omöjliga att genomföra. En viktig del när man gör simuleringar är att uppskata osäkerheten i det resultatet från simuleringarna. För att upkatta osäkerheten i simuleringarna används ibland något som kallas för invers osäkerhetskvantifiering. Invers osäkerhetskvantifiering handlar om att med hjälp av resultatet från någon simulering uppskatta värdet på inparametrarna som bestämmer den modell som används för simuleringarna. Invers osäkerhetskvantifier- ing går också ut på att bestämma osäkerheten i uppskattningen av inparametrarana.

Syftet med avhandlingen är att testa två nya metoder för invers osäkerhetskvantifiering mot två mer etalberade metoder.

Problem och metod

Ett enkelt exempel på inparameter i en modell skulle kunna vara tyngdaccelerationen i en modell som mäter tid för ett fall från en viss höjd. I det här exemplet är det då tyngdacceleration som vill uppskatta. För att exemplifiera det som har gjorts i den här rapporten kommer vi använda oss av ett annat exempel som inte är lika enkelt. Anta att vi har någon modell som beskrivs av funktion:

y = x³, (1)

där y i det här fallet är utparametern och x är inparametern. Låt oss anta vi har ett uppmätt värde av y som är 3 och att osäkerheten på mätningen av y är 1. Osäker- heten beskrivs här av en standardavvikelse. Invers osäkerhetskvantifiering handlar om att utifrån det uppmätta värdet på y och osäkeheten på denna mätning kunna uppskatta det mest sannolika värdet på x och bestämma osäkerheten på uppskattningen av x Ett sätt att göra det här kallas för Monte Carlo sampling av utparametrar. Det här är en av metoderna som våra nya metoder jämförs med. Vad den här metoder går ut på är att vi använder slumpvist framtagna samples som tillsamans har samma standardavikelse som osäkerheten på y och ett medelvärde som motsvarar det uppmätta värdet på y.

Vi samplar 7 punkter från y och använder någon numerisk metod för att få fram det motsvarande värdet på x som uppfyller ekvationen ovan.

y 2.13 4.11 3.64 4.92 3.46 2.94 2.37 x 1.29 1.60 1.54 1.70 1.51 1.43 1.33

Det tabellen visar på översta raden är sju slumpade värden för y med ett medelvärde på 3 och en standardavvikelse på 1. Raden under visar motsvarande värden av x för varje värde av y. Det vill säga att x-värdet uppfyller ekvationen y = x³. Vi kan nu uppskatta det sannolikaste värdet för x och osäkerheten för x från all värden vi har räknat ut i x. Om detta görs så finner vi att x-värderna har ett medelvärde på 1,49 och standardavvikelse på 0,14. Från detta kan vi tolka att x sannolikaste värde är 1,49 och osäkerheten på detta värde är 0,14 . Ett problem med den här metoden är att man egentligen behöver

(4)

många fler sampels av y för att få en bra uppskattning av x och osäkerheten av x. Detta kan vara problem att använda många sampels eftersom det då kan ta väldigt lång tid att genomföra invers osäkerhetskvantifiering. Därför testar vi en ny metod som minskar antalet sampels drastiskt. Den här metoden kallas för Deterministic sampling och går ut på att vi använder väldigt få sampels från y, men att vi väljer dessa sampels på ett smart sätt så att de har samma medelvärde som det uppmätta värdet på y och en standardavvikelse som är samma som osäkerheten i uppmätningen av y. Dessa punkter av y-värden kallas för en ensemble. I vårt exempel behöver vi hitta en ensemble som har ett medelvärde på 3 och en standardviklese på 1. Den lättaste ensemble som uppfyller detta besår av y-värdena 2 och 4. Om vi nu gör en motsvarande tabell för Deterministic sampling och räknar ut x-värdena för varje y-värde får vi

y 2 4

x 1.26 1.59

För att uppskatta värdet på x kan medelvärdet på x-punkterna i tabellen tas och det ger 1.42 och standardavvikelsen blir 0.16. Detta betyder att DS uppskatar det sanna värdet på x att vara 1.42 och att osäkerheten i denna uppskatning är 0.16. I rapporten som tidigare nämnts testas ytterligare en invers osäkerhetskvantifiering metod i den här avhandling och den kallas för Weight Fixing. Den metoden påminner lite om Deter- ministisk Sampling men gör invers osäkerhetskvantifiering på ett annat sätt. Syftet med avhandling är att testa hur bra de två nya metoderna fungerar i jämfört med dels Monte Carlo som förklarats tidigare samt ytterligare en till etablerad metod som kallas Maxi- mum likelihood estimation (MLE).

Tester och resultat

För att se hur bra de olika metoderna är, gjordes 6 olika tester med olika modeller i varje test. För att få ett säkrare resultat testade varje modell flera gånger men med lite olika uppmätta värden av y vid varje test. Testerna visar att både Deterministisk Sampling och Weight Fixing är metoder som presterar jämförbart med med de redan etablerade metoderna. En annan slutsats som resultatet visar är att Weight Fixing och Determinsitisk Sampling är metoder som kräver mindre datakraft än Monte Carlo för samma problem men ungefär lika mycket datakraft som MLE.

Med WF finns det dock ett problem i och med att metoden inte alltid fungerar. Beroende på modell så funkar de olika ofta men för en modell som användes lyckades metoden ungefär 75% av alla gånger den testades. En vidareutveckling av den här metoden blir att försöka hitta ett annat sätt att implementera Weight Fixing, så att den lyckas i stort sett varje gång oavsett vilken modell som används.

Acknowledgments

I wish to show my gratitude to the Swedish Radiation Safety Authority (Strålsäker- hetsmyndigheten) for financing the supervision of this thesis by contract SSM2019-6421.

(5)

A special thanks go to Peter Hedberg at Strålsäkerhetsmyndigheten who initiated the idea that deterministic sampling also should be able to be used for model calibration and helped with supervising the project.

I would like to pay my special regards to Arne Sahlberg at Uppsala University for being my supervisor and helping me throughout the project with all aspects of it. My special regard also goes to my subject reader Henrik Sjöstrand at Uppsala University for his expertise throughout the thesis. Lastly, I would also like to thank Gustav Robertson at Uppsala University for helping me with the implementation of one of the real-world models used in this thesis.

(6)

Notation

q arbitrary random variable bold denotes vector.

q_i one component of the vector q

f (q) probability density function (PDF) of q

hqi mean

hδ²qi variance

σ standard deviation

hδⁿqi n:th moment about the mean

˜

q ensemble of sampled q-points.

qⁱ one sampled q-point N number of sampled points

w weights for calculating weighted moments hδq_iδq_ji covariance between qi and qj

ρ_q_i_,q_j correlation between qi and qj

C covariance matrix

x input parameter, bold denotes vector K number of input variables

y output parameter, bold denotes vector M number of output variables

random variable that models the error in the measurement of y L(x) the likelihood function with respect to x

l the log-likelihood function

H the hessian matrix of l with respect to x

˜

x ensemble of x-points

˜

y ensemble of y-points

X the rough region around the true value of x

d_i relative difference for estimate of x to the true value of x d¯ mean of all di

σ_d standard deviation of all di

R¯_σ The mean of the ratio between each reality’s estimate of the standard deviation and the standard deviation of the absolute evaluation bias

(9)

χ²/N chi squared of di, shows how close a set of variables is to the model

(10)

1 Introduction

Computer simulations and mathematical modeling have for a long time been essential in most parts of both science and technology. This is extra prevalent in nuclear technology where large-scale experiments are often impossible or too expensive. Instead, large computer simulations are often used to do these experiments. An important thing when doing these simulations is to also quantify the uncertainty of results both for the input and output parameters. To quantify the uncertainty in the input parameters Inverse Uncertainty Quantification (IUQ) can be used.

Inverse Uncertainty Quantification is the process of how to best estimate the values of the input parameters in a simulation, and the uncertainty of said estimation, given a measurement of the output parameters. IUQ has been used in several applications, for example, for estimating uncertainty in input parameters for nuclear fuel simulations [16].

A problem with IUQ is that it can be computationally heavy because it usually involves executing a computationally costly function.

Monte Carlo sampling of output parameters (MC) is an IUQ method in which randomly sampled points are generated from the distribution describing the uncertainty of the output parameters. Then for each sampled point, a corresponding input is found. From these inputs, the distribution of the input can then be estimated. P. Helgesson et. al [17] state that the advantages of the Monte Carlo methods are that they can propagate uncertainties even for nonlinear cases and it does not need to assume that the output distribution has to be a normal distribution. However, some disadvantages with Monte Carlo are also brought up, mainly the computational cost which is high compared to other IUQ methods. One way to minimize the computational cost is to use fewer samples. However, for MC methods this leads to an IUQ with less precision. But there exist some sampling methods that can reduce the number of samples without the same loss of precision. Such methods include, for example, Stratified sampling and Latin Hypercube sampling which use a sophisticated method for choosing from which region random samples should be sampled from [3].

One way to minimize the number of samples, that has been tested and used for forward Uncertainty Quantification, is Deterministic Sampling (DS). The basic idea behind this new proposed IUQ method with DS is that instead of using random samples to represent the distribution like for MC, samples are selected in such a way that they have the same statistical moments as the distribution of the output. These statistical measures include the mean, the variance, and the covariance. An example where deterministic sampling is used successfully for forward uncertainty propagation is Sahlberg et.al in [15]. In this paper, they conclude that deterministic sampling works in their case for estimating neutron rate error and that deterministic sampling, in this case, is comparable to Monte Carlo methods. One important thing to mention about DS is that another approach to IUQ with DS has been tested by Hessling [9]. In this paper, he uses stratifying and annealing for IUQ which is something that is not used in this thesis.

Since DS has at least in some cases been shown to comparable to MC for forward propagation, this thesis is investigating whether DS is also a good method for IUQ. The method that uses DS for IUQ is doing the same as for MC but with samples chosen by DS as

(11)

described earlier. This thesis also investigates another new approach to IUQ, which uses random samples from the rough region around the input to create a linear problem that involves finding the right weights so that the output has the right statistics. This method is similar to another IUQ method called Importance Sampling which is implemented by Helgesson et.al. in [17]. The difference between this new method and Importance sampling is mainly in their approach to finding the right set of weights. The aim of this thesis is to benchmark the DS and the WF methods against a traditional MC method and a traditional method using the inverse of the Hessian to estimate the uncertainty

2 Theory

2.1 Probability and Statistics

This thesis starts by introducing statistical concepts that are needed later to understand IUQ and DS.

2.1.1 Single variable probability

The probability density function (PDF) is a function f(q) that describes how likely it is for a random variable q to take on a certain value. The probability to then find a value at a certain interval [a, b], P (q ∈ [a, b]) is calculated by integrating f(q) over the interval which gives the following expression

P (X ∈ [a, b]) = Z b

a

f (q)dq. (2)

To note here is that if a → −∞ and b → ∞ the probability is always one since this interval contains all possible values q can take.

Two important measures for any random variable and their PDF is the expected value and the variance. The expected value, also known as the mean value, is the average value for any random variable q and is denoted by hqi and can be calculated by

hqi = Z ∞

−∞

qf (q)dq. (3)

The variance is a measure of how much a random variable deviates from the mean and is in this thesis denoted by hδ²qi. It is defined as

hδ²qi = Z ∞

−∞

(q − hqi)²f (q)dq (4)

or equivalently as

hδ²qi = h(q − hqi)²i. (5)

(12)

Because the variance of q has the square dimension of q an often used quantity is the square root of the variance which is known as the standard deviation and is defined as

σ_q =phδ²qi. (6)

These concepts of variance and mean can be generalized by what is known as moments.

There are two kinds of moments. The first one is known as a moment about the origin and is defined as the expected value of qⁿ and is denoted accordingly like hqⁿi. The second kind of moments are known as central moments and is defined as

hδⁿqi = (hq − hqiⁿ)i. (7)

In this thesis, the n:th moment means the n:th central moment except for the first moment which means the expected value. The third moment is known as the skewness and it is a measure of how symmetric a distribution is. A positive skewness corresponds to that large deviations from the expectation value are more likely to be in the positive direction, and vice versa for negative skewness. A symmetric distribution therefore have a skewness of zero. The fourth moment is known as kurtosis and it is a measure of how likely the distribution is to produce samples with significantly larger deviations from the expectation value than the standard deviation and like the variance, it only assumes positive values [10]. Note that this thesis uses the non-normalized versions of skewness and kurtosis if not otherwise stated.

2.2 Single Variable statistics

Consider some random variable q that N measured data points ˜q = {q⁽¹⁾, q⁽²⁾, ..., q^{(N )}} has be taken from. These data points can be used to estimate the statistics of q. To distinguish the actual mean, variance, etc from the ones calculated from data points., the one calculated from data points is proceeded by the word sample. There is the sample mean that is used to estimate the true mean. The sample mean is calculated as follows

hqi = 1 N

N

X

i=1

qⁱ. (8)

Calculating variance from a set of data can be done in two different ways depending on the contest. There is the population variance and the sample variance. The population variance is used when the data points used contain all points from a population while the sampled variance is used when a random set of all data points is chosen to calculate the variance for the whole population. The population and sample variance are defined as

hδ²qi_population= 1 N

N

X

i=1

(q⁽ⁱ⁾− hqi)², (9)

(13)

hδ²qi_sample = 1 N − 1

N

X

i=1

(q⁽ⁱ⁾− hqi)²

[4]. In this thesis, the population variance is used when talking about calculating the variance from a set of points. The reason behind this is that DS uses points that constitute a whole population.

Skewness, kurtosis, and even higher-order moments are calculated similarly to how variance is calculated. The formula for the n:th sampled moment is

hδⁿqi = 1 N

N

X

i=1

(q⁽ⁱ⁾− hqi)ⁿ, (10)

where n = 3 gives the skewness and n = 4 gives the kurtosis[1].

Sometimes when calculating mean, variance, skewness, and kurtosis it is not desirable to treat all samples equally, instead a weight is given to each sample. The idea behind weighted samples is that for each sample qⁱ a weight wi is associated. The formula for the weighted mean then becomes

hqi = PN

i=1w_iqⁱ PN

i=1w_i (11)

and the formula for the other weighted moments becomes

hδⁿqi = PN

i=1w_i(qⁱ− hqi)ⁿ PN

i=1wi

. (12)

In this thesis, the weights have the extra constraint that they have to sum to one. The reason for this is to avoid having to divide by the sum of the weight in equation (11) and (12). Note that if wi = _N¹ for all i in equations (11) and (12) then the these equations becomes equal to (8) and (10) respectively.

2.2.1 Covariance

Consider a set of random variable q = {q1, q₂, ..., q_M}that are not necessarily independent of each other. A measure of how independent they are is called covariance. The covariance between two variables is denoted as hδqiδq_ji and it is defined as the difference between the expected value of the product of the two variables and the product of their respective expected values. This definition can be stated as follows

hδq_iδq_ji = hq_iq_ji − hq_iihq_ji. (13) A non-zero covariance indicates that two variables are not independent of each other.

If the covariance is negative that indicates that they are negatively correlated and if it

(14)

is positive that indicates that they are positively correlated. Two things to note about the covariance is that it is symmetric with respect to i and j and if i = j the variance is calculated. A visual example of how covariance effect a distribution can be seen in figure 1 where the heat map for three two dimensional multivariate normal distribution is shown.

All three heat maps have the same mean and variance for all the variables. However, the covariance is different for all of them. In the left picture where the covariance is zero, it can be seen that the distribution is radially symmetric. This is not the case for the other two heatmaps where the covariance is non-zero. For the heat map in the middle with negative covariance, it can be seen that high y-values often correspond with low x-values and low y-values often correspond with high x-values, i.e. they are negatively correlated.

In the heat map to the right with positive covariance, the high y-values often correspond with high x-values and low y-values often correspond with low x-values which means that they are positively correlated.

Figure 1: The figure shows the heat map for three two dimensional multivariate normal distribution with different covariance. In all three heat maps, both random variables have all means equal to zero and alll variance equal to one. In the left heat map, the covariance between the two random variables is zero, in the middle one it is -0.5 and in the right one it is 0.5

Since the value of the covariance is dependent on the size of the variable, it is sometimes hard to say how much correlation a given covariance corresponds to. A more convenient measure of how correlated two random variables is known as correlation and is denoted as ρqiqj and is defined as:

ρqiqj = hδq_iδq_ji

phδ².q_iihδ²q_ji. (14)

Because of this scaling of the covariance by the square root of the two variances to get the correlation, the correlation can only take on values between -1 and 1.[5] A useful way of representing the covariance for a set of random variables is the covariance matrix C

(15)

which is defined as

C =







hδ²q₁i hδq₁δq₂i · · · hδq₁δq_Ni hδq2δq1i hδ²q2i · · · hδq2δqNi

... ... ... ...

hδq_Nδq₁i hδq₂δq_Ni · · · hδ²q_Ni







. (15)

Because the covariance is symmetric about i and j so is also the covariance matrix.

Moreover if i = j the variance is calculated so the main diagonal of the covariance matrix is the variance of each variable qi.

Like the variance, the covariance from sampled points can be calculated in two different ways depending on the context, either dividing by N or N − 1. Like the variance, the covariance in this thesis is calculated by what is known as the population covariance.

The reason for this is that for deterministic sampling a whole population is used. The population covariance of two variables can be calculated as follows

hδqiδqji = 1 N

N

X

n=1

(q⁽ⁿ⁾_i − h.qii)(q⁽ⁿ⁾_j − hqji) (16)

where qi⁽ⁿ⁾and q⁽ⁿ⁾j is the n:th sample of qi and qj respectively[11]. Then there is also the weighted covariance that is calculated as follows

hδq_iδq_ji =

N

X

n=1

w_n(q_i⁽ⁿ⁾− hq_ii)(q_j⁽ⁿ⁾− hq_ji) (17)

where a weight wn has been associated with each sampled point (q_i⁽ⁿ⁾, q_j⁽ⁿ⁾). 2.2.2 The normal distribution

The PDF that is mostly used in this report is the normal distribution and for one random variable q it is defined as

f (q) = 1 σ√

2πexp

−(q − µ)² 2σ²

, (18)

where µ is the mean and σ² is the variance. The skewness of the normal distribution is zero since the function is symmetric about the mean. The kurtosis of the normal distribution is hδ⁴qi = 3σ⁴. One property of the normal distribution is that the mean value is also the value with the highest probability. A way to write that a random variable q is normally distributed with mean µ and standard deviation σ is q ∼ N (µ, σ).

The normal distribution also exists for more than one random variable and is then known as the multivariate normal distribution. The PDF for a set of random variables q =

(16)

{q₁, q₂, ..., q_N} that follows a multivariate normal distribution with covariance matrix C is defined as

f (q) = 1

p(2π)^Ndet(C)exp

−1

2(q − µ)^TC⁻¹(q − µ)

, (19)

where q is M × 1 vector with all the random variables qi, µ is a M × 1 vector with the mean of each random variable and det(C) is the determinant of the covariance matrix. A set of random variables q that has a multivariate normal distribution with mean vector µ and covariance matrix C can be denoted by q ∼ N (µ, C).

3 Methodology

This methodology section first describes what Inverse Uncertainty Quantification (IUQ) is and how the different IUQ methods work. Lastly, it describes how the different IUQ methods are tested and which measures are used to benchmark the various methods.

3.1 Inverse Uncertainty Quantification

The aim of Inverse Uncertainty Quantification (IUQ) is to estimate input parameters given the output and its uncertainty. IUQ is also about quantifying the uncertainty of this estimation. Consider M experimental data outputs y = y1 y₂ ... y_M

, where each yiis a different quantity that is measured. In this case y is dependent on K variables x = x1 x₂ ... x_K

, where again each xi is a different quantity. It is important to point out that M and K do not have to be that same. The case were M = K = 1 is called the one dimensional case and all other cases are called the multidimensional case. Due to uncertainties in measurements of y there exist no direct function y = f(x). Instead y has some uncertainty distribution. This relation between y and x can be stated as

y = f (x) + , (20)

where is an experimental error term that models the error in the measurement. This error term modeled as ∼ N (0, σ) in one dimension and as ∼ N (0, C), where σ and C is the standard deviation and the covariance matrix of the error term respectively.

The mean of is zero. IUQ is then about trying to estimate the value of x given a value of y and that C or σ is known. IUQ is also about quantifying the uncertainty in the estimate of x. It is only the error term that makes y a random variable, x, and therefore also f(x) is always the same value. One thing this gives is that since f(x) is just a variable the standard deviation of y is the same as the standard deviation of .

In the multidimensional case, this means that the covariance matrix of y and are the same.

The model parameters are often in other papers denoted by θ, and the independent variables are often denoted by x. But in this thesis, x denotes the model parameters and there are now independent variables of the model. What having no independent variables of the model means, in this case, is that y is measured several times with the

(17)

same initial condition and from those measurements, a mean and standard deviation of y is calculated. Another instance where the IUQ model used in (20) can be applied is if we have a single measurement and the uncertainty of said measurement is known. An example of a model that could be used for the IUQ model in equation (20) is a model that measures the time it takes for something to fall from a distance d if the initial velocity is zero and the goal was to estimate the acceleration of gravity g. Such a model would be described by

t = s

d

g, (21)

where t is the time it takes for the thing to fall. If we assume d is a known quantity then for this model y = t and g = x

The IUQ methods that are explained in this section are Maximum Likelihood in section 3.1.1, Monte Carlo sampling of output parameters in section 3.1.2, Deterministic Sampling in section 3.1.3 and Weight Fixing in section ref 3.1.4.

3.1.1 Maximum likelihood estimation and the Hessian matrix

Maximum likelihood estimation(MLE) is a method for finding the most probable value for a parameter given some output data. Consider the relation in equation (20) for the one dimensional case and some experimental data y = {y⁽¹⁾, y⁽²⁾...y^{(N )}} where each point has some uncertainty with standard deviation σi for each y⁽ⁱ⁾. For each point y⁽ⁱ⁾ there is a likelihood function Pi(x : y⁽ⁱ⁾)describing how compatible a measurement y⁽ⁱ⁾ is with a parameter value x. An example of this with normal distribution for each y⁽ⁱ⁾ and the function y = f(x) is

P_i(x : yⁱ) = 1 σ_i√

2πexp −(f (x) − y⁽ⁱ⁾)² 2σ_i²

. (22)

However, Pi only describes the likelihood function for one data point yⁱ. But we want to find the most probable value when considering all data points. Therefore the total likelihood function L is introduced as the product of all the individual samples yi likelihood functions Pi as

L ≡

N

Y

i=1

P_i(x : yⁱ). (23)

The x-values that maximize L are then the most likely values for x given the output data y. To make the maximizing calculation easier for computers by avoiding very small numbers it is here better to calculate the maximum of the logarithm of the likelihood function. The logarithm of the likelihood function more commonly known as the log- likelihood function is denoted by l. The reason why L and l have a maximum at the same x-value is that the logarithm function is a strictly increasing function. Another

(18)

advantage of using l is that the product becomes a sum because of logarithm rules. l can then be expressed as

l = ln L = ln

N

Y

i=1

Pi(x : yⁱ)

!

=

N

X

i=1

ln(Pi(x : di)). (24) The last thing to consider with the maximum likelihood method is that it does not calculate the expectation value of x but instead the x value with the highest likelihood, i.e., the most likely value given data. These two values of x are not generally the same for a distribution, but are the same for the normal distribution which is why it can still be useful when dealing with a distribution where the expected values and the maximum values are very close[19]. Further, often the expectation value itself can be interesting to find.

This method for finding the maximum likelihood for a one dimensional problem can easily be generalized to more dimensions. The big difference is that each probability function Pi(x : y⁽ⁱ⁾) is now dependent on several variables so x → x = {x1, x₂...x_K} and y → yi = {y_i⁽¹⁾, y_i⁽²⁾...y_i^{(M )}}. Since Pi is a probability it is still only a single output function. Finding the maximum likelihood is still just maximizing the log-likelihood function. One property of l is if L is just the normal distribution meaning that f is defined as equation (19) than the covariance matrix C of the uncertainty in the estimate of x can be calculated as follows

C = −H⁻¹ (25)

where H in this case is the Hessian matrix of l. The Hessian matrix for the multivariate log-likelihood function l(x1, x₂, ..., x_K) is defined as

H =







∂²l

∂x²₁

∂²l

∂x1∂x2 · · · _∂x^∂²^l

1∂xN

∂²l

∂x2∂x1

∂²l

∂x²₂ · · · _∂x^∂²^l

2∂xN

... ... ... ...

∂²l

∂x_N∂x1

∂²l

∂x2∂x_N · · · _∂x^∂²2^l N







(26)

A proof of equation (25) can be found in appendix A 3.1.2 Monte Carlo methods

Monte Carlo method (MC) for IUQ is a family of IUQ method that is built on using random samples in some way. There exist many of them but what they all have in common is that they in some uses at least pseudo-random numbers for representing a distribution.

The main advantage with MC is that it is a rather robust method. The Monte Carlo method that is used in this thesis is Monte Carlo sampling of output parameters.

First consider this method in the one dimensional case of equation (20). Create an ensemble ˜y = {y⁽¹⁾, y⁽²⁾, ..., y^{(N )}}of points N that have the same statistics as y by sampling

(19)

points from the distribution of y which in this case is a normal distribution. Then find an ensemble ˜x = {x⁽¹⁾, x⁽²⁾, ..., x^{(N )}}using some numerical method(such as the Nelder–Mead method) that solves nonlinear equation that for each y⁽ⁱ⁾ finds the corresponding x⁽ⁱ⁾ that fulfills

f (x⁽ⁱ⁾) = y⁽ⁱ⁾. (27)

Then use equation (8) to estimate the expected value of x and the sample variance in equation (2.2) to get the variance of the estimate.

The multidimensional case works much like the one-dimensional only with some small differences. One of them is that now the sampled points are M-dimensional and are sampled from a multivariate normal distribution to create an ensemble ˜y. This multivariate normal distribution should have mean hyi and covariance matrix Cy, where Cy

is the covariance of the distribution of y. Then for each M dimensional point find the corresponding K dimensional point in x that satisfies equation (27) to create ensemble ˜x.

This can again be done with numerical methods for solving nonlinear equations. Then from ˜x compute the mean and the covariance matrix from equation (8) and (16).

There also exist more sophisticated sampling algorithms as mentioned in the introduction. Moreover, other Monte Carlo methods use different ways approaches to IUQ. These methods use for example Markov Chains and Bayesian statics and they are not explained in this report.

3.1.3 Deterministic sampling

Deterministic sampling (DS) is a method that has been shown to work for forward prop- agating of input parameter uncertainty. Example of this can be seen in [8],[14] and [12].

DS has its foundation in the unscented transform that was first proposed by Uhlman in [13]. Like random sampling, deterministic sampling is based on the idea of representing a distribution by a set of points. But unlike random sampling that uses many random points to represent the distribution deterministic sampling uses only a few selected values, so-called sigma points, with specific weights attached to them to represent the whole distribution. The collection of the sigma points is called an ensemble. The sigma points and their corresponding weights are chosen such that they have the same statistical moments up to some moment as the parameter.

Consider the one dimensional case with a random variable y which has some mean hyi and variance hδ²yi. The goal is then to create an ensemble ˜y = {y⁽¹⁾, y⁽²⁾, ..., y^{(N )}} with N sigma points yi with corresponding weights wi that fulfills:







1 = PN i=1w_i hyi =PN

i=1w_iy⁽ⁱ⁾ hδ²yi =PN

i=1w_i(y⁽ⁱ⁾− hyi)².

(28)

The top equation makes sure that the sum of all the weights is one, which is a requirement when calculating weighted moments. The middle and bottom equation makes sure that

(20)

the ensemble has the right weighted mean and the right weighted variance respectively.

The simplest ensemble and weights that satisfies all three conditions is y^(1,2) = hyi ± σ_y and w1,2 = ¹₂ where σy is the standard deviation of y.

If ˜y also should have the same skewness and kurtosis as y, the weights and the ensemble must fulfill additional constraints that make sure that ˜y has the right weighted skewness and weighted kurtosis. These two constraints are

hδ³yi =

N

X

i=1

w_i(y⁽ⁱ⁾− hyi)³ (29)

hδ⁴yi =

N

X

i=1

w_i(y⁽ⁱ⁾− hyi)⁴ (30)

where hδ³yi is the skewness of y and hδ⁴yiis the kurtosis of y. Because of the additional constraints on ˜y additional points have to be added compared to when only the mean and variance was taken into consideration. Assuming that the distribution for y is a normal distribution , the simplest ensemble that fulfills all 5 equations is ˜y = {hyi −

√3σ_y, hyi, hyi +√

3σ_y}with weights w = {¹₆,²₃,¹₆}. A more detailed explanation how this solution is found can bee sen in [20]

When the ˜y is found the next step is very similar to how it is done for random sampling.

I.e., for each sigma point in ˜y find a corresponding x-point. All x-points can now form an ensemble ˜x = {x⁽¹⁾, x⁽²⁾, ..., x⁽²⁾} where all x⁽ⁱ⁾ has to fulfill f(x⁽ⁱ⁾) = y⁽ⁱ⁾. Finding the statistics is now simple using equation (11) to find the mean of x and using equation (12) and taking n = 2 to find the variance of x.

Finding an ensemble in a higher dimension is not as simple as doing it one but still quite straightforward for multivariate normal distributions. Firstly an ensemble in M dimensions is represented by a matrix as follows

˜ y =







y₁⁽¹⁾ y₂⁽¹⁾ · · · y_M⁽¹⁾ y₁⁽²⁾ y₂⁽²⁾ · · · y_M⁽²⁾ ... ... ... ...

y^{(N )}₁ y^{(N )}₂ · · · y^{(N )}_M







N ×M

˜ w =





 w₁ w₂ ...

w_N







N ×1

(31)

where each row represents the components of one sigma points and each column represents the same component for different sigma points. The advantage with this notation is that it is now easier to calculate the mean and the covariance matrix via known matrix operations. Let the mean of all variables yi be represented by a row vector hyi = hy₁i hy₂ i · · · hy_Mi

. The mean can then be calculated by

hyi^T = ˜y^Tw˜ (32)

and the Covaraince matrix by

C = δ ˜y^Tδ ˜y ( ˜w ⊗ 11×M) (33)

(21)

where 1a×b is an a by b sized matrix were all elements are ones and ⊗ is the Kronecker matrix product. The definition of δ˜y is

δ ˜y = ˜y − 1_{N ×1}⊗ hyi (34)

For convenience and without loss of generality, the mean of all variables is zero and the variance is one. In the multi-dimensional case only ensembles that encode the mean, variances and covariance, are considered. An easy ensemble that encodes the right mean and variance and zero covariance is the binary ensemble discussed by Hessling in [8].

The binary ensemble is a generalization of the one dimensional ensemble ˜y = {1, −1}. It consists of only 1 and -1 and has to be constructed in such a way that all columns of the ensemble have a mean of zero. Also, all the columns have to be independent of each other, i.e, they have zero covariance. A type of matrix that satisfies all these conditions and is similar to the one used by Hessling is the Modified Hadamard matrix. The modified Hadamard matrix is explained further in Appendix C.1. So an ensemble in M dimensions is simply the M:th modified Hadamard Matrix. The weights for all samples are the same and they are wi = ¹_r, where r is the number of rows in ˜y.

The next step is now to encode a covariance into the ensemble. This is done by skewing the non-covariant ensemble to introduce a dependence between the parameters. Let ∆ denote the matrix square root of the Covaraince matrix C, i.e

∆∆^T = C (35)

The matrix square root is found by diagonalizing the covariance matrix and the calculating ∆ by

∆ = S⁻¹√

DS, (36)

where S is a matrix with eigenvectors of C as rows and D is a diagonal matrix with the eigenvalues of C as its components. The rows of ∆ is then used to define a new set of normal basis vectors {ˆb1, ˆb₁, ..., ˆb_d}, that are defined as,

ˆb_i = ∆(i, :)

||∆(i, :)|| (37)

where ∆(i, :) denotes the i:th row of ∆ and ||∆(i, :)|| denotes the norm of that row. The new ensemble ˜ycov that encodes the covariance can then be calculated by

˜

y_cov = ˜y_indepB (38)

where ˜yindepdenotes the ensemble with no variance and B denotes the matrix where each row is one of the basis vectors ˆbi such that B(i, :) = ˆbi. The weight for each point is still the same after skewing [20].

Once the y ensemble ˜y is found the next step in the IUQ is to find for each sigma point in ˜y the corresponding x-sigma point. This is done exactly like how x-points are found

(22)

for MC. All the sigma points are then combined into a ˜x so that the mean and covariance matrix for x can be calculated by equation (32) and (33) respectively

3.1.4 Weight Fixing

Weight Fixing (WF) is a proposed IUQ method that builds upon the concepts used for the deterministic sampling method used by Sahlberg in [20] for forward propagation.

Whereas DS IUQ that is described earlier is more or less the same idea that used for UQ [20] but doing it backwards. WF instead uses the method in [20] for finding the right weights to construct a method that results in a linear problem.

First, consider the one-dimensional IUQ case. Let X denote the rough region around the true value of x. Then sample N points from X to create an ensemble of x points

˜

x = {x⁽²⁾, x⁽²⁾, ..x^{(N )}}. From ˜x create ˜y = {y⁽¹⁾, y⁽²⁾, ...y^{(iN )}} by having yi = f (x_i) where f (x) is function from equation (20). Subsequently, find the right ensemble of weights ˜w such that ˜y has the same moments as the variable y and that the sum of all the weights is one. This problem can most easily be written as a matrix equation of the form A ˜w = b. On matrix form, the equation is written as







1 1 · · · 1

y₁ y₂ · · · y_N

(y₁− hyi)² (y₂− hyi)² · · · (y_N − hyi)²

... ... ... ...

(y₁− hyi)^m (y₂ − hyi)^m · · · (y_N − hyi)^m











 w1

w₂ ...

w_N







=





 1 hyi hδ²yi

...

hδ^myi







(39)

where m denotes to what moment of y ˜y should agree to. The first row of the matrix ensures that the weight all sum to one the second ensures that ˜y has the right mean. The next rows ensure that ˜y has the right m:th moment. In most cases, m is two if only the variance of y is taken into consideration or sometimes four if also skewness and kurtosis should be taken into consideration. To note here is that there should be more points sampled than there are equations for ˜w, i.e., there are more columns in A than there are rows. This means that the problem is underdetermined which gives some freedom in finding ˜w. Since there is a set of possible solutions, we can seek one where all wi

are positive, which is desirable when using weighted samples to represent a probability distribution.

One way to solve equation (39) with the requirement of only having positive weights is making use of linear programming, especially the interior-point method. A detailed explanation of how linear programming can be used to find the right weights can be found in appendix B

When a ˜w that fulfills equation (39) is found, theses weights can be used to estimate the mean and higher order moments of x using equation (11) and (12) respectively.

Sometimes there exist no set of positive weights that satisfies equation (39). This is usually because X is too small. A small region in x also means a small region in y if f (x) is a continuous function. Then no matter how the weights are chosen the weighted moments of order two or higher can only take on some max value. To illustrate this

(23)

consider an ensemble ˜y of N points. Let y^(min)denote the smallest value of any sample and y^(max) denote the sample with the highest value. Then the biggest weighted covariance

˜

y can have is achieved if the weights for y^(min) and y^(max) are 1/2 and all other weights are zero. To fix this problem X is expanded and additional points are sampled from the newly added region. These points are also used to add new points in ˜y. Then linear programming is used once again to see if there exist weights that make sure all moments of ˜y are correct. If not, this process of expanding X and sampling new points can be repeated until weights that satisfy equation (39) are found.

This method can be generalized to higher dimensions. For the higher-dimensional case were x has K dimension and X is the K dimensional region around the point xestimate. Which is a point in K dimensional space that has the coordinate of the best guess for each parameter xi in the set of input parameter {x1, x₂, ...x_K}. The next step is, like for the one-dimensional case, to sample points N from this region X to create an ensemble

˜

x. These points are used to calculate ˜y by y = f(x), where y is M dimensional, i.e., for each set of x M outputs are produced. Note, the dimension of y and x does not have to be the same. The resulting matrix to be used in equation (39) has to have more rows than in the one-dimensional case in order to fix more statistical parameters. Like in the one-dimensional case the first row consists of only ones, to make sure that all the weights sum to one. The next M rows make sure that the means for all yi is correct. Then M additional rows are needed for the variances of ˜y. For the covariances of ˜y, there needs to be an additional ^{M (M −1)}₂ number of rows. This gives that the total number of rows in matrix A in equation 39, which is denoted by Nrows, in the case with an M-dimensional output is given by

N_rows= M²

2 + 3M

2 + 1. (40)

In all cases when M 6= 1, the kurtosis and skewness of y are not considered in order to not get too many rows in matrix A. Note, the number of rows is independent of the dimension of x.

When A and b for the multi-dimensional case are found, ˜w can be computed via linear programming. The mean and covariance matrix of ˜x can then be computed using equation (32) and (33) respectively. Like in one dimension if there exists no set of weight that can fix all means, variances, and covariances then the X is expanded and new additional points are sampled to see if then ˜w can be computed so that ˜y have the right statistics.

3.2 Models

This section describes the models that are tested for the different IUQ methods in the benchmarking. First, the artificial models are described then the two real-world models are described. There are in total four different artificial models that are tested. The first two artificial models are one-dimensional and the last two have two inputs and two outputs. The reason that both two and one-dimensional models are tested is that that we first want to make sure that the methods work by testing them in one dimension and then testing them in two dimensions to see if it is possible to generalize the new methods to higher dimensions. One of the one-dimensional models and one of the two-dimensional models are polynomials. The last two artificial models contain exponential terms. The

(24)

reason that polynomials and exponential functions are chosen is that we want to test functions that are not linear or close to being linear.

3.2.1 Artificial models

Two one-dimensional artificial models are tested. The first one is modeled by the equation

y = x³. (41)

The true value of x is equal to 3, meaning that the mean of y becomes hyi = 27. The uncertainty of y is assumed to be 15%.

The second one-dimensional artificial model is described by equation

y = e^x, (42)

where the true value of x is 2 which means that hyi = e² = 7.389 once again the uncertainty of y is assumed to be 15%.

Two multi-dimensional artificial models are also tested. In both cases, the function has two input parameters and two output parameters. The first multi-dimensional model is described by the following equation

y =y₁ y2

=

x₀x₁ x0+ x²₁

. (43)

The true value of x was assumed to be equal to xtrue = (2, 3)which means that the mean value of y becomes hyi = 6, 11. The covaraince matrix of y in the case was assumed to be

C_y =0.36 0 0 1.21

. (44)

The second multi-dimensional artificial model is described by equation

y =y₁ y₂

=e^−x⁰ + 2x1

x₀+ e^−x¹

, (45)

the true value of xtrue = (1, 1.5) which means that the mean of y is hyi = (5.72, 1.22).

The covariance matrix of y was set to be

C_y =0.736 0 0 0.0337

(46)

3.2.2 Oxide-layer thickness model

One of the models that are used for benchmarking is a model that calculates the thickness of the oxide layer that occurs on fuel rods in nuclear reactors. The model is described in [7] by Forsberg et.al. by two differential equations that predict the growth of the thickness of the oxide layer over time. These two differential equations are

(25)

∂s

∂t = A

s exp −Q₁ RT

(47)

∂s

∂t = C exp −Q₂ RT

. (48)

The thickness is denoted by s, the time is t, A and C are rate factors,Q1 and Q2 are acti- vation energies, R is the universal gas constant, and lastly T is the interface temperature between the metal and the oxide. The first differential equation models the thickness in the early stages before the first transition. The second differential equation describes the thickness after the first transition. The first transition is not described further in this thesis. We have chosen the thickness after 365 days as the output parameter and the variable C as our input parameter when doing IUQ. This means that the other constants have a fixed value which is the same one as recommend by Forsberg et.al in [7]. The true value of C was assumed to be Ctrue = 2 · 10⁸. The uncertainty in the thickness is assumed to be 10%.

3.2.3 Neutron emission spectra model

The second real-world model that is used is a code that simulates the measurement of a neutron emission energy spectrum from reactions in fusion plasma, from a set of plasma parameters. These four plasma parameters are T the temperature of the plasma, Ith, I_bt the fraction of neutrons interacting with the detector that originates from thermonu- clear and beam-thermal (ions from NBI injection [21] reacting with the thermal plasma) reactions respectively, and Ibs the faction of the neutrons that scatter into the detector from the reactor wall. The output parameters of the model are the number of counts in each bin of the detector. We use the first 22 bins of the detector as the output. The detector that the measurement is simulating is a magnetic proton recoil detector [6]. The model starts by first generating a synthetic measurement of a spectrum from the input parameters which can be seen in figure 2 as the black markers. Then it tries to fit the best values of the input parameters such that in match the spectrum generated. The best fit of these parameters can be seen by the lines in the figure. Note, the x-axis of the figure is position because it shows where in the detector the proton is detected.

(26)

Figure 2: The figure shows the result from a simulated measurement of a neutron emission energy spectrum from reactions in fusion plasma. The black marker is the number of counts in each bin, the red line shows the best fit of a line with all input parameters. The blue line shows the best fit for thermal parameters which are T and I_th. The green line shows the best fit for the parameter I_bt and the black line shows the best fit for the parameter I_bs

The true value of the input parameters that is used are T = 6 keV, Ith = 0.7, Ibt = 0.2, Ibs = 0.1. It is assumed that the number of counts in a bin is poison distributed meaning that the standard deviation of the uncertainty for each count is the square root of the number of counts.

3.3 Benchmarking

This section describes the benchmarking of the IUQ methods. First, we describe how this is performed for the one-dimensional case with artificial functions. Then we describe how this is performed when both the input and output are two-dimensional for an artificial function. The last thing that is described is how this is done for the two real-world examples. The idea behind the methodology used for the benchmarking of IUQ methods is built upon the ideas presented by Helgesson et.al in [18]

The basic idea starts by first assigning a true value of x that is denoted by xtrue. The parameter xtrue can be determined either by arbitrarily picking a value if it is not a real-world example or picking the agreed-upon true value if it is a real-world example.

The next step is to randomly sample different values of y by using equation (20) with x = x_true. To note here is that it is only the error term that is a random variable in equation (20) so since has a normal distribution so does also y with a standard deviation equal to σ. Each sampled value of y is called one reality. For each reality, all different IUQ methods are used to estimate x and the uncertainty of this estimate σx. If a method is not able to find a solution for a reality that reality is not used for calculating the result.

This is mainly because WF does not always converge to a solution with the method that is implemented in this thesis. To compare how good the different IUQ methods are four

(27)

different measures are calculated. The first one is the relative evaluation bias d

d = 1 N

N

X

i=1

(x_i− x_true)

x_true , (49)

where xi is the estimated value of xtrue for the i:th reality. The second measure is the relative standard deviation of evaluation bias σd

σ_d= v u u t

1 N − 1

N

X

i=1

(d_i− d)², (50)

.

where di = ^(xⁱ_x^−x^true⁾

true . The third measure is the mean of the ratio between each realities estimate of the standard deviation and the absolute standard deviation of the evaluation bias. This ratio is denoted by ¯Rσ and is calculated by

R¯_σ = 1 N

N

X

i=1

σ_i

σ_abs, (51)

where σi is the IUQ methods estimate of the uncertainty of the i:th reality and σabs is the standard deviation of the absolute difference from the true x-value and can be calculated by

σ_abs = σ_dx_true. (52)

To estimate the spread of Rσ the standard deviation of the ratio σR is also calculated by the formula

σ_R = v u u t

1 N − 1

N

X

i=1

(R_σ,i− ¯R_σ)², (53)

.

where Rσ,i = _σ^σⁱ

abs The last measure is χ²_true/N and it is defined as χ²_true

N = 1

N

X

i=1

(x_i− x_true)²

σ²_i (54)

These four measures are chosen because they give a good overview of how well each IUQ method performs. First, the evaluation bias ¯d shows whether a method tends to underestimate or overestimate. The standard deviation of the relative evaluation σd bias shows how much each method tends to deviates from the mean evaluation bias. Lastly

(28)

R_σ focuses on how good each method is at estimating the uncertainty , where χ²/N also include effects due to the bias of the method. Both ¯R_σ and χ²/N should be close to 1.

The only difference in the multidimensional case is that now y is instead assumed to be a multivariate normal distribution and the IUQ methods try to find the covariance of the estimate of x instead of the standard deviation. There are also one value of ¯d, σrel and R_σ for each input parameter. However, χ²/N still only has one value but the formula now becomes

χ²_true

N = 1

N

X

i=1

−1

2(x_i− x_true)^TC_i⁻¹(x_i− x_true), (55) where Ci in this case denotes the estimated covariance matrix of x for the i:th reality.

These four IUQ Methodes are first tested for two artificial one-dimensional models and the oxide layer case. In the artifical one-dimensional and the oxide-layer case for DS, both ensembles that have two and four moments the same as y are tested. This is also the case for WF where weights are found both in the case for fixing 2 and 4 moments of ˜y. Then all the four IUQ methods are tested in the case where both the output and the input are two dimensional. Lastly, the four IUQ methods are tested in the Neutron-emission spectra case. MC uses 500 samples for each reality in all the cases.

4 Results

This section shows the result from the different benchmarking of the different IUQ methods. This is presented with graphs that show the values of the four benchmarking measures, ¯d, σd, Rσ, χ²/N for each of the four IUQ methods. Each figure also has graphs that display the runtime and the number of function calls for each IUQ method, which is also a very important metric since it shows how computationally expensive each method is.

Figure 3 presents the result of the benchmarking of the IUQ methods for the case with function y = x³, the mean of y is hyi = 27 and the standard deviation of y is 15% of the expectation value. These values are all calculated from 1467 realities of the 2000 realities that were tried.

(29)

4.1 One dimensional case

Figure 3: The figure shows the result from an one-dimensional benchmarking with artificial function y = x³, xtrue = 3, the mean hyi = 27, and the standard deviation of the uncertainty σy = 4.05. MC is Monte Carlo sampling of output parameters. MLE is maximum likelihood estimation with hessian matrix. DS 2M and DS 4M is deterministic sampling that has an ensemble with two respectively four moments the same as y. WF 2M and WF 4M is weight fixing that calibrates the weights of the ensemble such that it has two respectively four moments the same as y. The scale in (a-d) are linear while they are logarithmic in (e-f )

Figure 4 displays the same thing as figure 3 but in this case y = e^x, hyi = e² = 7.389 and the standard deviation σy = 0.15hyi = 1.108. These values are calculated from 1831 different realities of the 2000 that were tried.

Inverse Uncertainty Quantification using deterministic sampling: An intercomparison between different IUQ methods

Examensarbete 30 hp Juni 2021

Inverse Uncertainty Quantification using deterministic sampling

An intercomparison between different IUQ methods

Hjalmar Andersson

Abstract

Inverse Uncertainty Quantification using deterministic sampling

Populärvetenskaplig sammanfattning

Problem och metod

Tester och resultat

Acknowledgments

Contents

Notation

1 Introduction

2 Theory

2.1 Probability and Statistics

2.2 Single Variable statistics

3 Methodology

3.1 Inverse Uncertainty Quantification

3.2 Models

3.3 Benchmarking

4 Results

4.1 One dimensional case