• No results found

Statistical Analysis of ComponentVariations within Electronics

N/A
N/A
Protected

Academic year: 2021

Share "Statistical Analysis of ComponentVariations within Electronics"

Copied!
54
0
0

Loading.... (view fulltext now)

Full text

(1)

Linköpings universitet

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Datateknik

21 | LIU-IDA/STAT-A--21/019--SE

Statistical Analysis of Component

Variations within Electronics

Oriol Garrobé Guilera

Supervisor : Hao Chi Kiang Examiner : Annika Tillander

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten finns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

Electronics engineers rely on component tolerances to create functional designs. It is important to be aware that each component brings some uncertainty to the functionality of the assembly and therefore to the performance. To deal with this problem there are differ-ent approaches within the tolerance analysis field. Currdiffer-ently Veoneer uses the Worst Case Circuit Analysis methodology that assumes that the worst scenario for every component is possible, achieving very safe but pessimistic results. Electronics design can therefore be optimized through statistical analysis.

This work focuses on modelling the variance in the performance of a set of electronic components that compose a device or assembly by means of statistical techniques. It uses different methodologies to find models to explain the behaviour of these assemblies and it tests its results through different statistical tests to decide which one performs the best. From this point the work is focused upon finding the tolerance limits that are optimal for the design while fulfilling safety requirements imposed by the automotive regulatory body. The result obtained substantially improves the previous Worst Case Circuit Analysis model while being as safe as required. This is a methodology that does not require of huge amounts of data; making the process affordable for industry. Finally, it can be concluded that statistical tolerance analysis can improve substantially the current results and boost the design of safety elements in electronics in the automotive industry while being as reliable as always.

(4)

Acknowledgments

I would like to thank the great support and guidance that the supervisor of the project at Linköping University Hao Chi Kiang has given me. He has been always available to help me get through any issue that I found.

I would also like to thank my external supervisor at Veoneer Peter Karlen together with Gabriel Kulig and Johan Moleklint. My sincere gratitude for trusting me with this challeng-ing thesis and for all the valuable feedback on all my works.

Finally, I also would like to thank my family and friends to be by my side through the degree, without them it would have been much more difficult and less enjoyable.

(5)

Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures vi

List of Tables vii

1 Introduction 1

1.1 Problem Definition and Motivation . . . 1

1.2 Aim . . . 2

1.3 Research Questions . . . 2

1.4 Delimitations . . . 3

1.5 Related Work . . . 3

1.6 Outline of the Work . . . 4

2 Theory 5 2.1 Tolerance Analysis . . . 5

2.2 Fitting Methods . . . 6

2.3 Validation Methods . . . 8

2.4 Cumulative Distribution Function Estimation . . . 10

2.5 Quality Measures . . . 11 3 Method 12 3.1 Data . . . 12 3.2 Fitting Methods . . . 14 3.3 Validation Methods . . . 20 3.4 Workflow Diagram . . . 24 4 Results 26 4.1 Fitting Methods . . . 26 4.2 Validation Methods . . . 30 5 Discussion 37 5.1 Results . . . 37 5.2 Method . . . 42 6 Conclusion 44 6.1 Summary and Critical Reflection . . . 44

6.2 Future Work . . . 45

(6)

List of Figures

3.1 Distribution of the 20000 simulated data points. . . 14

3.2 Workflow Diagram. . . 25

4.1 Normal distribution with calculated MLE parameters. . . 27

4.2 Gamma distribution with calculated MLE parameters. . . 27

4.3 Beta distribution with calculated MLE parameters. . . 28

4.4 LogNormal distribution with calculated MLE parameters. . . 28

4.5 Exponential distribution with calculated MLE parameters. . . 29

4.6 Normal distribution with calculated MoM parameters. . . 29

4.7 Gamma distribution with calculated MoM parameters. . . 30

4.8 Beta distribution with calculated MoM parameters. . . 30

4.9 LogNormal ditribution with calculated MoM parameters. . . 31

4.10 QQ plot of MLE fitted distributions. . . 31

4.11 QQ plot of MoM fitted distributions. . . 32

4.12 Probability of lower WCCA. . . 34

4.13 Probability of lower WCCA. . . 34

4.14 Probability of lower WCCA. . . 35

(7)

List of Tables

2.1 Recommended CP values. . . 11

3.1 First 5 rows of the simulated dataset. . . 13

4.1 Comparison of Validation Measures. . . 33

4.2 P-Value for the different LRTs. . . 33

4.3 Probabilities of outside margins. . . 35

(8)

1

Introduction

1.1

Problem Definition and Motivation

The problem that this Thesis work is focused on is to improve the current Tolerance Analysis model used for an assembly of electronic components using statistical analysis to achieve optimized designs. The focus is put on the Power Supply Unit (PSU). The PSU supplies a voltage to different components within a very restrictive voltage range specified by the manufacturer. This can be done through large design margins but it is expensive; statistical analysis can lower these margins.

Veoneer is an automotive company focused developing active safety electronics. Due to the nature of their work, their products must be extremely reliable and live up to the requirements; the proper functioning of them is vital to ensure the safety of car users. In this context, designing robust electronics for the automotive industry with high availability, high quality and long life length is challenging. More in particular, the PSU has to ensure precise voltage supply to different control systems, such as the Electronic Control Unit (ECU) and the Control Area Network (CAN).

The current solution analyzes the electrical design using a Worst Case Circuit Analysis (WCCA) which is also called deterministic or high low tolerance analysis. Each electronic component is placed at their tolerance limits to make measurements as small or as large as possible without considering the statistical distribution of the individual variables. This method yields very pessimistic results, i.e. very wide voltage ranges.

From this point the results obtained with WCCA can potentially cause an over engi-neered design which is very expensive and sometimes unnecessarily complex. To create a cost-optimized design while maintaining quality and fulfilling all design requirements a statistical approach could be beneficial. The expectation is that by taking into consideration the probabilistic behaviour of the manufacturing processes and the probability distributions of the resulting assembly response functions, new insights would be obtained and a less pes-simistic result would be reached. In order to be able to use the results as proof the analysis needs to be based on the industry best practices.

(9)

1.2. Aim

There are different strategies to analyze the output of the PSU and model it by means of statistical distributions. Maximum Likelihood Estimators (MLE) or the Method of Moments (MoM) can find standard fitted distributions. Using different non parametric statistical tests that quantify the goodness of the fit such as the Chi-Square Test [McH13] or statistical tests that compare different proposal distributions and select the most promising one such as the Non-Nested Likelihood Ratio Test (NN-LRT) [NP33] or the Kolmogorov-Smirnov Test (KS) [PG81], the distribution that models best the output is chosen. Finally, analyze such fit to find out whether the working span is within required limits using process quality measures.

As a result, optimized and reliable assemblies - increased component tolerances and lower fabrication costs - can be achieved by applying proper statistical tolerancing. This is possi-ble when worst-case tolerancing is not a contract requirement, improving current designs in industry.

1.2

Aim

The purpose of this thesis is to investigate statistical models related to electronic design in regards to worst case, tolerances and quality analysis. This models should prove a relevant better performance than the ones currently used at the company. The final goal is to suggest a suitable method and a way of working based on industry best practices. The model is intended to be used by designers to create optimized products by taking advantage of the principles of statistics to relax the component tolerances without sacrificing quality.

More in particular, the work will be focused on using algorithms previously used in the literature to prove that they are useful in the electronics field for tolerance analysis. Compare these methods to the one currently used and among them and provide insights - strengths and weaknesses- of each one to help the designers make an educated decision to which method fits best the interests of the company.

1.3

Research Questions

Based on the previous objectives, the research questions that motivate this thesis are states as follows:

Is it possible to improve the current performance model of a Power Supply Unit within an electronics assembly using Statistical Analysis to achieve optimized designs?

1. Does this model improve and to which extent the current baseline of WCCA?

Using the Worst-Case Analysis model as a benchmark, it it possible to find a model based on Statistical Analysis that yields better results in regards of performance and manufacturing costs?

2. Can some restraints be relaxed?

Is it possible to design simpler or cheaper products based on the results provided by the statistical tolerance analysis?

3. Is this model reliable enough within the context of Automotive Safety?

Is it safe to assume that the design based on this statistical modelling will provide the expected results and take into account adverse conditions in order to ensure the safety of the passenger inside the vehicle?

(10)

1.4. Delimitations

1.4

Delimitations

This is a project that intends to compare different approaches to the same problem and discuss its strengths and weaknesses. From this point, it is important to use the same data available for all of them, as different data for different approaches would yield biased results. From this point of view, the data used for this project is simulated data based on the knowl-edge from the components and the architecture of the assembly. Briefly, the data obtained tries to be as realistic as possible but it is synthetic.

The electronic components - that are the ones that bring uncertainty to the model- are assumed to work within some boundaries. The supplier ensures the proper functioning of the component to be within a minimum and a maximum value. Also, the Automotive Indus-try requires any supplier to provide quality measures of the components. From this point and using the mentioned information, the statistical distributions of such components can be assumed. To get a more realistic understanding of the global functioning of the assembly, having the real statistical distributions of the components would be optimal.

Finally, only the PSU is discussed since it is a design element completely in Veoneer’s control and with clear right/wrong criteria. Therefore, this study will only focus on this specific part of an electronic design, leaving the rest of the assembly unstudied. The rest of the design is out of scope.

Briefly, this is a theoretical study of a real world problem. It is limited by the assumptions and the lack of real data.

1.5

Related Work

This section reviews literature related to the topic in question attempting to put this Thesis work in a scientific context. It also intends to show other approaches to the same problem and the state of maturity of the field at the time of writing.

Tolerance Analysis refers to the study of tolerance stacking within assemblies. It is widely used for mechanical [She+05] and electrical [VV09] systems. In mechanical systems it is very important to study the different geometries [NT95] to ensure the movement of the assembly, sometimes focusing only on Kinematic Tolerance Analysis [JSS97]. In electrical systems it is mainly focused on power flow analysis [Bin+18] or on optimal electronic circuits design. This project is focused on the latter.

From this regard, there are two main tolerance Analysis Techniques: Worst Case Analysis (WCA) and Statistical Variation Analysis (SVA). WCA refers to the technique of looking for the limits of the assembly by adding up all the components limits, in particular for electronic circuits it is the so called WCCA. This method of stacking up the tolerances, it is also called Stack Up Analysis [Smi96]. On the other side, SVA uses the principles of statistics to relax the components tolerances by studying the behaviour of each component by a statistical distribution instead of a random value within some margins.

There are many different approaches that try to find the real behaviour of assemblies where each component introduces uncertainty. [FS00] uses the power of the Genetic algo-rithm to try to find a very accurate estimate of the worst case limits. In [NT95] different methods of SVA are introduced, among them a Monte Carlo simulation which requires large amounts of data to achieve reliable approaches or Bayesian methods that try to find the posterior distribution of the assembly based on expert prior knowledge. [Zha+13] introduces

(11)

1.6. Outline of the Work

a method of studying the output value focusing on the operating characteristics of critical components.

In [DQ09] WCA to SVA methods are compared in the context of geometrical deviations in mechanical assemblies. Veoneer’s current approach is based on WCCA, which is used in this work as a benchmark to compare to different SVA methods. These methods are: Method of Moments (MoM) and Method of Maximum Likelihood (MLE).

On the goodness of the fit, there are many studies looking for the most efficient and reli-able tests for statistical hypothesis. To choose the best fit, different approaches will be used. First a graphical approach, the Quantile-Quantile Plot (QQP), first introduced in the sixties in [WG68]. Also a non parametric approach such as the χ2´test [RS+15]. These methods will help to choose the potential best fits that afterwards will be compared using the Likelihood Ratio Test (LRT) [NP33]. The latter, will follow the methodology used in [LBG11] as the distributions in question are Non-Nested Distributions, therefore a Non-Nested Likelihood Ratio Test (NN-LRT) is used.

At this point, the Delta Method is used to obtain estimates of the probability of an observation falling outside some limits. Following Automotive Industry best practices, a Probability Capability Index value (CP) is computed using the previous mentioned limits. The CP value can be found in the context of Six Sigma - a quality process theory introduced by Bill Smith while working at Motorola in 1986 [Ten01] - will be provided.

Briefly, this Thesis works aims to create a methodology to find the underlying statisti-cal distribution of the PSU output. Different approaches widely studied in the literature are used and compared, highlighting their strengths and weaknesses. It is expected to provide a reliable enough validation pipeline to ensure that the results obtained follow Safety Require-ments. All these requirements are stated in the Production Part Approval Process (PPAP), the regulator organism of Automotive Safety.

1.6

Outline of the Work

The remainder of this report is organized as follows:

• Chapter 2 introduces the reader to the theoretical background related to the methods that are formulated and developed in this thesis.

• Chapter 3 describes the method in detail along with details about their implementa-tion. This detailed description of the work should allow the reader to replicate the experiments obtaining similar results.

• Chapter 4 clearly reports the obtained results with as much detail as possible.

• Chapter 5 analyzes previously reported results. It also includes a critical discussion of the methodology used, a comparison between the different models developed and a final section discussing the impact in the industry.

(12)

2

Theory

To get a better understanding of this work, the following chapter summarizes and describes all the theory that has some connection to this work.

2.1

Tolerance Analysis

Tolerance Analysis refers to the techniques related to the study in product design and man-ufacturing processes that try to understand how variability or imperfections in components propagate across the design. In particular, how the variation of a set of inputs affects the output of interest in systems that accumulate variation. An example of these systems are electrical systems.

Proper Tolerance Analysis is important because it can improve cycle time and perfor-mance of products and it can reduce costs [KSY14]. It studies how it affects the capability of a product to meet the customer expectations, to avoid problems that lead to rejected parts and to show whether engineering specifications are met or on the contrary they violate requirements. It can also give higher safety and less risk of quality issues.

Generally speaking, there are two fundamentally different analysis tools to study toler-ance variation: Stack Up Tolertoler-ances and Statistical Variation Analysis.

Stack-Up Tolerances

Tolerance stackups or WCA describe the problem-solving method of calculating the effects of the accumulated variation that is allowed by specified tolerances. It assumes that all parts lie within the specified worst limits when used for tolerance allocation [Fis11].

This method is the simplest and most conservative of the classical approaches and it does not consider the statistical distribution of the individual variables. The idea is to predict the maximum expected variation of the measurement where the variables do not exceed some specified limits. For instance, it could be specified that the power output of an assembly must be within p1 and p2 power units. The proper functioning of the device needs the power

(13)

2.2. Fitting Methods

output to be in this interval.

Each electronic component is placed at their tolerance limits to make measurements as small or as large as possible. It guarantees 100 percent the proper functioning of the assembly, i.e. the output assembly behaves within range with a probability of being outside the limits equal to 0 regardless of the actual component variation.

There is a major drawback to this and is that WCCA requires very tight individual compo-nent tolerances causing expensive manufacturing, longer lead times, higher manufacturing risk and over complex devices. As the number of parts in the assembly increases, the compo-nent tolerances must be greatly reduced in order to meet the assembly limit which results in higher production costs.

Statistical Variation Analysis

The SVA model uses statistics theory to relax component constraints without losing perfor-mance. The distribution of the different components are combined through the assembly to find a global distribution. The idea is to model the output distribution that describes the assembly variation, allowing the engineer to design a product to different quality levels.

There are two ways of modelling the variation. The first combines the tolerance limits of every component creating a composite of distributions. The second is based on simulation, where a random value coming from the distribution of the component is drawn and then adding all this together, an output value is obtained. This latter approach is called Monte Carlo Simulation [Ray08].

Monte Carlo simulation refers to a statistical method used to solve complex mathemat-ical problems by generating random variables. Monte Carlo’s method analyses the risk by creating possible outcomes. It computes any number of times the result, every time with a different set of values. These values come from the probability distributions of each of the inputs. Doing this many times it is possible observe the underlying shape of the target variable. Depending on the number of uncertainties – components with variability - and the ranges specified it can be necessary to do thousands of computations.

This thesis work will be focused on the second approach. It is the most popular method for nonlinear statistical tolerancing and lends itself well to the case where the component parameters have distributions other than Gaussian. The disadvantage of the simulation ap-proach is that to get accurate estimates large amounts of samples are needed and depending on how big is the assembly it could be too computationally expensive [NT95].

2.2

Fitting Methods

The methods that will be used to fit a model to the data are the following.

Method of Maximum Likelihood

This method selects as estimates the values of the parameters that maximize the likelihood function (the joint probability function or the joint density function) of the observed sample. The likelihood function represents a hyper-surface whose peak, if it exists, represents the combination of parameter values that maximizes the probability of drawing the sample obtained.

(14)

2.2. Fitting Methods

The likelihood function measures the goodness of fit of a statistical model to a sample data for given values of the unknown parameters. In the continuous case, let Y be a ran-dom variable following a continuous probability distribution with density function f(yi)of

parameter θ, the likelihood function can be represented as,

L(θ|y1, y2, ..., yn) (2.1)

The likelihood depends on k parameters θ1, θ2, . . . , θkand chooses the parameter vector of

the model θ that maximizesL, taht is defined as,

L(θ|y1, y2, ..., yn) = n

ź

i=1

f(yi), (2.2)

whereś is the product operator.

Due to numerical stability, it is easier to use the log likelihood function instead. The reason is that maximizing a product of percentages, one small value dominates the entire product, can be difficult. Therefore,

` =log(L(θ|y1, y2, ..., yn)) = n

ÿ

i=1

log(f(yi)) (2.3)

Finally, the MLE estimates ( ˆθMLE) are:

ˆθMLE =arg max

θ

`, (2.4)

where arg maxθrepresents the parameters θ that maximize the function`.

The estimators obtained using this method are usually good and reliable.To choose them it is necessary to select a family of distributions large and flexible enough to fit the data. Also, it is not guaranteed that the resulting estimate is a global optimum, it is possible that the estimate found is a local optimum.

Method of Moments

To understand the MoM it is important to review what the population moments are. The moment of a function is a quantitative value that is used to describe the characteristics of a distribution. More in particular, a Central moment is a moment of a probability distribution of a random variable about the random variable’s mean. A Standarized moment is a moment that is normalized typically divided by the standard deviation which renders the moment scale invariant. The most common moments are the first 4 moments as they can help get a lot of insights from the distribution. These are:

• 1st Central Moment - Mean. The mean measures the central tendency of a random variable characterized by a distribution.

• 2nd Central Moment - Variance. The variance is the expectation of the squared devi-ation of a random variable from its mean, i.e. it measures how far a set of numbers is spread out from their average value.

• 3rd Standarized Moment - Skewness. The Skewness measures the degree of asymme-try of the probability distribution of a real-valued random variable about its mean. • 4th Standarized Moment - Kurtosis. The Kurtosis measures how heavy are the tails of a

probability distribution of a real-valued random variable. Higher kurtosis corresponds to greater extremity of deviations - heavier tails.

(15)

2.3. Validation Methods

The MoM is a very simple procedure for deriving point estimators. The idea behind this method is that the sample moments should provide good estimates of the population mo-ments. Having a random sample of n observations, y1, y2, ..., ynthe kthmoment is,

µk=E(Yk), (2.5)

and the corresponding sample moment is,

mk= 1 n n ÿ i=1 yki (2.6)

To estimate the general unknown parameters it is necessary to equate the kthmoment of

every family of distributions to the sample moment such as,

µk =mk, for k=1, 2, ..., t, (2.7)

where t is the number of parameters to be estimated. The MoM finds the solution of equation 2.7 where µkis function of the parameters of each family of distributions.

Estimators obtained using the MoM are usually consistent estimators of their respective parameters, although they are not necessarily the best estimators. Often this method is func-tion of sufficient statistics. A statistic is sufficient to a model relative to the uknown parameter if "no other statistic that can be calculated from the same sample provides any additional in-formation as to the value of the parameter" [Fis22]. It is intuitive, easy to apply and yields reasonable estimators.

2.3

Validation Methods

The methods that will be used to check the goodness of the fitted models are the following.

Quantile-Quantile Plot

Quantile-Quantile (QQ) Plot is a nonparametric graphical technique that provides an assess-ment of the goodness of the fit. The QQ Plot was first introduced in the sixties in [GW68].

It analyzes a probability distribution by plotting its quantiles against each other, i.e. the quantiles of the target distribution are on the horizontal axis and the ordered values of the sampling distribution are on the vertical axis. It allows to compare more than one distribu-tion by plotting them together on the same graph and decide which one behaves the best.

The optimal case would be that the distributions compared are exactly equal - the quan-tiles have the exact same value as the sampling quanquan-tiles - and the points in the QQ plot would form a straight line x=y. To make this even clearer, a 45˝reference dashed line is also

plotted. On the other hand, if the points fall far from the reference line, it means that they come from a different distribution.

The advantages of a QQ Plot is that the samples compared can differ in size and that many aspects of the distribution are seen at the same time, such as changes in symmetry, shifts in location or scale, and the presence of outliers.

(16)

2.3. Validation Methods

Pearson’s χ

2

Test

The Chi square test is a test statistic introduced in 1900 by Karl Pearson in [RS+15] and it is a function of the weighted squared deviations of the observed values from their expected values: Y2= k ÿ i=1 [ni´E(ni)]2 E(ni) , (2.8)

where k stands for the number of partitions to the data and it is directly related to the degrees of freedom (df) of the χ2distribution andř is the summation operator. The df are the maximum number of values which have freedom to vary in a sample. On the other hand, nirepresents the discrete observed values and E[ni]represents the expected values.

Pearson showed that under mild conditions and large values of ni, Y2„ χ2(k ´ 1).

It can be used to test whether sample data indicate that a specific model for a population distribution does not fit the data, it is called goodness-of-fit test, where it has a null hypothesis of the form H0: F=F0, where F0is a probability distribution.

To decide when to accept or reject the null hypothesis it is necessary to specify the so called level of the test, α. α is the probability of a type error I, reject H0when it is true. The rejection

region is determined by the degrees of freedom of the test statistic and the value of α. The null hypothesis H0will be rejected when the value of the test statistic is in the rejection region.

If the test is to be applied to a continuous distribution this distribution has to be dis-cretized by defining a set of more than 5 bins. Two standard methods often used are bins of equal size and bins with equal probabilities under the null hypothesis.

Various studies have shown that when applied to data from a continuous distribution it is generally inferior to other methods such as the Kolmogorov-Smirnov Test. This is due to its less powerful compared to other tests. However, it is still the go-to test in many applied fields. [RG20]

Likelihood Ratio Test

The Likelihood Ratio Test (LRT) is used to compare the goodness of the fit of two competing families of distributions based on their likelihoods, the idea behind it is to choose the model that fits best the data [LBG11].

Recall that the power of a test, is the probability that the test rejects the null hypothesis H0

when the alternative hypothesis H1is true, i.e. rejects correctly, thus the test with the greatest

power is the best test possible. The power of a test was introduced in the Neyman-Pearson Lemma in [NP33]. The LRT is a very good choice then to compare two models as it has the highest power.

Selected random sample from a distribution with a likelihoodL(θ|y1, ..., yn), let θ be a

vector of k parameters θ= (θ1, ..., θk). The hypothesis to be tested are,

Ho : θ PΩ0

H1: θ PΩ1,

(2.9)

let Ω0 and Ω1 be partitions of the parameter space. Let L(ωˆ0|y1, ..., yn) denote the

(17)

2.4. Cumulative Distribution Function Estimation

Similarly, letL(ω|yˆ 1, ..., yn)be the best explanation for θ Ă Ω =Ω0YΩ1. IfL(ωˆ0|y1, ..., yn)

=L(ω|yˆ 1, ..., yn)holds, the best explanation can be found inΩ0and H0cannot be rejected.

Define λ by

λ= L(ωˆ0|y1, ..., yn)

L(ω|yˆ 1, ..., yn), (2.10)

then λ is the test statistic of the test and the rejection region is determined by λ ď k. The value of k is chosen so that the test has the desired value for α. It can be seen that the model specified by θ PΩ0is partition of to θ ĂΩ=Ω0YΩ1.

Sometimes λ is not a test statistic with a known probability distribution, however for large enough samples and under some conditions -such as the existence of derivatives, with respect to parameters, of the likelihood function - it can be approximated. Let y1, y2, ..., yn

have joint likelihood functionL(θ|y1, ..., yn). Let r0denote the number of free parameters that

are specified by H0 : θ PΩ0and let r denote the number of free parameters specified by the

statement θ PΩ. Then, for large n, 2 log(λ)has approximately a χ2distribution with r ´ r0df.

That is, a large-sample likelihood ratio test rejection region can be rewritten as,

Q=´2 log(λ)ą χ2α where χ2αis based on(r ´ r0)df. (2.11)

2.4

Cumulative Distribution Function Estimation

The objective of this work is to find the probability of an observation to fall outside some given margins. This probability can be found by means of the Cumulative Distribution Func-tion (CDF) of the family of distribuFunc-tions chosen previously. With the estimated parameters of the distribution family chosen it is possible to make inference of the population.

However, it is not possible to obtain this probability directly based on the model with the estimated parameters, this is due the fact that an estimation has some variability. From this point, we try to model the variability of the CDF by means of the Delta Method (DM). The DM will provide a framework to obtain a frequency distribution around the CDF values.

The DM states that it is possible to approximate the asymptotic behaviour of a function of some parameters under some assumptions and it is based on the Central Limit Theorem (CLT). The The Lindeberg-Lévy CLT [Le 86] states that having an i.i.d. sequence y1, y2, . . . , yn

of random variables with E[yi] = µand Var[yi] = σ2 ă 8; then as n Ñ 8, the random

variables?n(¯yn´ µ)converge in distribution to a normal N(0, σ2),

?

n(¯yn´ µ)ÝÑd N(0, σ2) (2.12)

More in depth, the DM states: suppose Un are any real random variables and an a

se-quence of constants; if for a sese-quence an of constants with an Ñ 8as n Ñ 8 there exists a

constant u for some random variable V where, • an(Un´u)ÝÑd V

• g :R Ñ R is differentiable at u with derivative∇g(u), then,

(18)

2.5. Quality Measures

where u are the constant population parameters,Σ is the covariance matrix of Unand∇g(u)

is the gradient.

From this point, the DM will be used to find the distribution that underlies the proba-bility of an observation falling outside the specified boundaries. It will also provide enough information to define the number of observations necessary to reach industry specifications regarding uncertainty as the mentioned distribution, can be a function of the number of observations.

2.5

Quality Measures

Process Capability Index

The Process Capability Index (CP) is a statistical measure of the ability of a process to produce an output within specified limits. It is relative to its specification limits telling how much variation there is. It allows to compare processes. CP it is known in the context of Six Sigma, introduced by Bill Smith while working at Motorola in 1986 [Ten01].

The Upper Specification Limit (USL) and the Lower Specification Limit (LSL) are the max-imum and minmax-imum acceptable value of a variable - both help define the performance of a device -. ˆσ is the estimated variability of the process or standard deviation and the CP value can be defined as,

CP= USL ´ LSL

6 ˆσ . (2.14)

The minimum CP value permitted for suppliers in the Automotive Industry for Safety Elements out of Context such as the ones that this Thesis work is focused upon is 1.67 ac-cording to PPAP. This value is also recommended for critical parameters in [Mon20] and in [Kam+17]. Veoneer, follows the PPAP rules and therefore the minimum value accepted for this work will be 1.67, any value below that does not meet industry parameters.

Booker, in [Boo+01] agrees with this criteria and recommends certain values for different processes and circumstances. Table 2.1 shows said values and confirms that for two-sided safety elements - minimum and maximum values specified - the CP value should be 1.67.

Situation Recommended minimum CP for two-sided specifications Recommended minimum CP for one-sided specifications Existing process 1.33 1.25 New Process 1.50 1.45 Safety or critical parameter for existing process 1.50 1.45 Safety or critical parameter for new process 1.67 1.60

Six Sigma process 2.00 2.00

(19)

3

Method

3.1

Data

Electronic Schematics

The first step to understand how the PSU works is to look at the schematics of the device. The schematics provide a drawing of all the connections and the electronic components that are part of the PSU and that makes it work properly. It is vital information to understand how the PSU provides the output value given any input. The schematics is the product of the design and they are created by the electronics desing engineers.

Data Sheet of Components

The data sheet of components is provided by the components manufacturer. Each electronic component present in the assembly has its own properties that are very relevant to the mod-elling of the output. The data sheet contains all the parameters relevant to the proper func-tioning of the components, its boundaries and tolerances. More in particular the suppliers need to provide the nominal value, i.e. the expected functioning value of the device along with the CP value. With this information it is possible to obtain the approximate distribution of all the related electronic components.

Simulated Data

The data used in this project is simulated on its totality. From an input variable - battery supply - there is a function that maps this input to an output adding all the uncertainties that each component brings to the function. This function is the result of all the connections and all the electronic components that are involved on the way of the power from the battery of the car to the electronic devices that the PSU supplies. In other words, all the modifications that the power suffers are mapped in this process and it is relative to how the electronic components are distributed together with the particular distribution of the tolerances of each component.

From this point, this uncertainty is modeled using the Python language and a Monte Carlo simulation. 20000 times the power required is computed for different combination of

(20)

3.1. Data

component values. The simulation provides a data frame with all the inputs, i.e. a table that contains for each simulated observation all the inputs coming from a random sampling from their respective distributions, and the target output variable. This table will be the one used to do all the computations in this work.

In particular, the table is formed by 10 columns, including the target variable, and as many rows as data points simulated, the number of rows is not fixed. The variables in the table are:

Input

• Vout. Required voltage output value [Volt]. • Iout. Required current output value [Ampere]. • Vin. Required voltage input value [Volt].

• LSRon. Metal oxide semiconductor field effect transistor (MOSFET) resistance value [mOhm].

• L. Inductance value [micro Henry]. • Fsw. Switch frequency [k Hertz].

• Vbodydiode. Required voltage diode value. • DCR. Direct Current Resistance value [mOhm].

• PIC. Power Management Integrated Circuit (PMIC) power dissipated [mWatts]. Output

• Pin. Required Power. Target value.

An example of the table can be seen in Table 3.1. It contains the example of 5 data points and it is seen how the input is mapped to the output.

Vout Iout Vin LSRon L Fsw Vbodydiode DCR PIC Pin

3.28E0 3.18E0 1.34E1 8.16E-3 4.58E-6 4.57E5 7.49E-1 1.81E-2 8.81E-2 11.34 3.31E0 3.09E0 1.33E1 8.15E-3 4.51E-6 4.63E5 8.74E-1 2-10E-3 8.90E-2 11.13 3.29E0 2.98E0 1.33E1 8.18E-3 4.46E-6 4.53E5 7.82E-1 2-10E-3 8.89E-2 10.70 3.30E0 2.94E0 1.34E1 8.19E-3 4.52E-6 4.56E5 7.47E-1 1.88E-2 8.65E-2 10.56 3.29E0 3.01E0 1.33E1 8.40E-3 4.44E-6 4.54E5 7.83E-1 2.08E-2 8.87E-2 10.82

Table 3.1: First 5 rows of the simulated dataset.

In particular, 20000 iid points are simulated due the fact that at this point it is not very computationally expensive to generate such dataset. Forward on this work it will be nec-essary to be very careful with the number of observations used, however at this stage it is possible to work with such number of points to obtain good estimates. The distribution of the data can be seen in Figure 3.1. It can be seen that the data is not standardized.

(21)

3.2. Fitting Methods

Figure 3.1: Distribution of the 20000 simulated data points.

3.2

Fitting Methods

Exponential Family

The Exponential Family is a parametric set of probability distributions with a certain shape that has several properties that makes it extremely useful for statistical analysis.

The exponential family is know to unify many of the most common, widely-used distri-butions, such as:

• Normal Distribution • Gamma Distribution • Beta Distribution • LogNormal Distribution • Exponential Distribution

One of the most important features of the exponential family distributions is the existence of a sufficient statistic whose dimension is independent of the sample size. It can summarize arbitrary amounts of data using a fixed number of values. This sufficient statistic can be easily obtained through the likelihood function.

It is also important to note that the MLE behaves nicely in this setting, the observed value of the sufficient statistic can be set as the expected value. The log-likelihood function of distributions from this family is concave. For all these reasons, distributions from this family are selected to fit the data, under the following methods.

Method of Maximum Likelihood

The MLE needs to assume a family of distributions to find the likelihood function and then use an optimization algorithm to find the value for the parameters that will be estimators of the population given the sample. The distributions that are fitted using the MLE are the following.

(22)

3.2. Fitting Methods

Normal Distribution

The Normal or Gaussian distribution is a continuous probability distribution for a real-value random variable. It is symmetric with support y P R, i.e. its values can be any real number.

A random variable Y follows a Normal probability distribution if the density function of Y is, f(y) = 1 σ ? e ´(y´µ)2/(2σ2), ´8 ăy ă 8, (3.1)

for some σ ą 0 and ´8 ă µ ă 8.

In a Normally distributed sample of random variables having mean µ and variance σ2 that follows the probability density function (3.1), the two parameters that need to be esti-mated are µ and σ2.

To find the log likelihood function that will provide the tools to estimate these parameters, 3.1 and 2.3 are combined, obtaining the following result,

` =´n 2 log()´ n 2log(σ 2) ´ 1 2 n ÿ i=1 (yi´ µ)2 (3.2)

Taking the partial derivatives of 3.2 with respect to µ and σ2, equating to 0 and solving the system, the MLE estimators obtained are,

ˆ µMLE= 1 n n ÿ i=1 yi ˆσ2MLE= 1 n n ÿ i=1 (yi´¯y)2, (3.3)

where ¯y is the sample mean. Gamma Distribution

The Gamma distribution is a continuous probability distribution. It is known as the maxi-mum entropy probability distribution for a random variable Y with support y P(0, 8), so it is necessary to process the data so the smallest value starts at 0+. To do so, the parameter loc is introduced, and will stand for the shift of the data.

A random variable Y follows a Gamma probability distribution if the density function of Y is, f(y) = $ & % yα´1e´y/β βαΓ(α) , 0 ď y ă 8 0, elsewhere, (3.4) where, Γ(α) = ż8 0 yα´1e´ydy, (3.5)

for some α ą 0 and β ą 0.

Following this probability density function, the two parameters that need to be estimated are α and β.

(23)

3.2. Fitting Methods

To find the log likelihood function that will provide the tools to estimate these parameters, 3.4 and 2.3 are combined, obtaining the following result,

` =nα log(β)´n log(Γ(α)) + (α ´1) n ÿ i=1 log(yi)´ β n ÿ i=1 yi (3.6)

Taking the partial derivatives of 3.6 with respect to α and β, equating to 0 and solving the system, the MLE estimators obtained are,

ˆβMLE= ˆαMLE ¯y Γ1(ˆαMLE) Γ(ˆαMLE) ´log(ˆβMLE) = 1 n n ÿ i=1 log(yi), (3.7)

whereΓ1 is the derivative of Γ.

There is no closed-form solution to this. By means of an iterative optimization algorithm values for the estimators αMLEand βMLEcan be found.

Beta Distribution

The Beta distribution is a continuous probability distribution. It is widely used for model behaviours of random variables with finite values as it is defined on the interval y P [0, 1], so it is necessary to preprocess the data so all the dataset is scaled to be contained in said interval. From this point, two parameters are introduced; loc, which informs about the shift of the data, and scale which yields information about the scaling of the data.

A random variable Y follows a Beta probability distribution if the density function of Y is,

f(y) = $ & % yα´1(1´y)β´1 B(α,β) , 0 ď y ď 1, 0, elsewhere, (3.8) where, B(α, β) = ż1 0 yα´1(1 ´ y)β´1dy= Γ(α)Γ(β) Γ(α+β), (3.9)

for some α ą 0 and β ą 0.

In a Beta distributed sample of random variables that follows this probability density function, the two parameters that need to be estimated are α and β.

To find the log likelihood function that will provide the tools to estimate these parameters, 3.8 and 2.3 are combined, obtaining the following result,

` =n log(Γ(α+β))´n log(Γ(α))´n log(Γ(β))

+ (α ´1) n ÿ i=1 log(yi) + (β ´1) n ÿ i=1 log(1 ´ yi) (3.10)

(24)

3.2. Fitting Methods

Taking the partial derivatives of 3.10 with respect to α and β, equating to 0 and solving the system, nΓ1( α+β) Γ(α+β) ´ nΓ1( α) Γ(α) + n ÿ i=1 log(yi) =0 nΓ1(α+β) Γ(α+β) ´ nΓ1(β) Γ(β) + n ÿ i=1 log(1 ´ yi) =0 (3.11)

There is no closed-form solution to this system of equations, so it will be solved for αMLE

and βMLEiteratively, using an optimization method.

LogNormal Distribution

The LogNormal distribution is a continuous probability distribution. The most character-istic feature of this distribution is that its logarithm is normally distributed. A normally distributed random variable can only take positive values, i.e. the support is y P (0,+8). From this point, it is necessary to process the data so that the smallest value of the dataset is as close as possible to 0+.

A random variable Y follows a LogNormal probability distribution if the density function of Y is,

f(y) = 1

?e

´(log(y)´µ)2

2σ2 , (3.12)

for some µ P(´8,+8)and σ ą 0.

In a LogNormal distributed sample of random variables having mean µ and variance σ2 that follows this probability density function, the two parameters that need to be estimated are µ and σ2.

To find the log likelihood function that will provide the tools to estimate these parameters, 3.12 and 2.3 are combined, obtaining the following result,

` =´ n ÿ i=1 log(yi)´n log   g f f e 1 n n ÿ i=1 (yi´ ¯y)2  ´ n 2log()´ n ÿ i=1 (log(yi)´ ¯y)2 2 n řn i=1(yi´¯y)2 (3.13)

Taking the partial derivatives of 3.13 with respect to µ and σ2, equating to 0 and solving the system, the MLE estimators obtained are,

ˆ µMLE= řn i=1log(yi) n ˆσMLE2 = řn

i=1(log(yi)´ ¯y)2

n

(3.14)

Exponential Distribution

The Exponential distribution is a continuous probability distribution which is the continuous analogue of the geometric distribution. The distribution is supported on the interval[0, 8). To do so, the parameter loc is introduced, and will stand for the shift of the data.

A random variable Y follows a Exponential probability distribution if the density function of Y is,

f(y) =

#

λe´λy y ě 0

(25)

3.2. Fitting Methods

for some λ ą 0.

To find the log likelihood function that will provide the tools to estimate these parameters, 3.15 and 2.3 are combined, obtaining the following result,

` =n log(λ)´ λ

n

ÿ

i=1

yi (3.16)

Taking the derivative of 3.16 with respect to λ, equating to 0 and solving the system, the MLE estimator obtained is,

λMLE= řnn

i=1yi

(3.17)

Method of Moments

The first step to use the MoM to find a distribution given the data is to find the Sample Moments of the data. However as seen previously the data is not standardized, and for some distributions it is necessary to standardize the data to use the MoM. From this point, any chosen distribution object of study will be detailed separately end-to-end, the preprocessing steps, obtain the sample moments and how to find population moments from them. It is important to remember that all the distributions chosen are continuous distributions as the data is continuous.

The first step to use the MoM is to decide which standard distribution to fit given the sample moments. Because of that, 4 different distributions are chosen: Normal, Gamma, Beta and Lognormal. The reason to choose these 4 is that they yielded good results with the MLE method and therefore are potential good distributions. Every distribution has its particularities, and therefore the approach to fit each one will be different.

Normal Distribution

A Normal Distribution that follows equation 3.1 is a function of the parameters σ and µ. Recall that this distribution has a support y P R, therefore the MoM can be used without any processing of the data.

Because of this, we seek estimators for the parameters σ and µ, thus we must equate two pairs of population and sample moments.

The first two moments of a Normal distribution with parameters µ and σ are:

µ11=µ and µ12=µ2+σ2 (3.18)

Equating this to their corresponding sample moments and solving for µ and σ,

µ11=µ=m11= ¯y µ12=µ2+σ2=m21= 1 n n ÿ i=1 y2i (3.19) It is obtained, ˆ µ= ¯y ˆ σ2= 1 n n ÿ i=1 (yi´¯y)2 (3.20)

(26)

3.2. Fitting Methods

Gamma Distribution

The Gamma Distribution follows equation 3.4 being function of the parameters α and β. It has a support y P(0, 8), therefore the data needs to be processed as stated previously.

Because of this, we seek estimators for the parameters α and β, thus we must equate two pairs of population and sample moments.

The first two moments of a Gamma distribution with parameters α and β are:

µ11=µ=αβ and µ12=µ2+σ2=αβ2+α2β2 (3.21)

Equating this to their corresponding sample moments and solving for α and β,

µ11=αβ=m11= ¯y µ12=αβ2+α2β2=m21= 1 n n ÿ i=1 y2i (3.22) It is obtained, ˆα= ¯y 2 (ř y2 i/n)´¯y2 = n¯y 2 ř (yi´ ¯y)2 ˆβ= ¯y ˆα = ř (yi´¯y)2 n¯y (3.23) Beta Distribution

The Beta Distribution follows equation 3.8 being function of the parameters α and β. It has a support y P[0, 1], therefore the data needs to be processed as stated previously.

Because of this, we seek estimators for the parameters α and β, thus we must equate two pairs of population and sample moments.

The first two moments of a Beta distribution with parameters α and β are:

µ11=µ= α α+β and µ 1 2=µ2+σ2= α2 (α+β)2+ αβ (α+β)2(α+β+1) (3.24)

Equating this to their corresponding sample moments and solving for α and β,

µ11= α α+β =m11= ¯y µ12= α 2 (α+β)2 + αβ (α+β)2(α+β+1) =m21= 1 n n ÿ i=1 y2i (3.25) It is obtained, ˆα= ¯y " ¯y(1 ´ ¯y) 1 n řn i=1(yi´¯y)2 ´1 # ˆβ= α(1 ´ ¯y) ¯y (3.26)

(27)

3.3. Validation Methods

LogNormal Distribution

The LogNormal Distribution follows equation 3.12 being function of the parameters µ and σ. It has a support y P(0, 8), therefore the data needs to be processed as stated previously.

Because of this, we seek estimators for the parameters σ and µ, thus we must equate two pairs of population and sample moments.

The first two moments of a Normal distribution with parameters µ and σ are:

µ11=e µ+σ2

2 and µ12=eµ+σ2+ (eσ2´1)e2µ+σ2 (3.27) Equating this to their corresponding sample moments and solving for µ and σ,

µ11=e µ+σ2 2 =m11= ¯y µ12=eµ+σ2 + (eσ2´1)e2µ+σ2 =m21= 1 n n ÿ i=1 y2i (3.28) It is obtained, ˆ µMoM=log     ¯y c 1 n řn i=1(yi´¯y)2 ¯ y2 +1     ˆσ2MoM=log 1 n řn i=1(yi´¯y)2 ¯ y2 +1 ! (3.29)

3.3

Validation Methods

Quantile-Quantile Plot

This section details the process of creating a QQ Plot to be able to compare graphically different fitted distributions and get insights from them in order to choose the best potential fits. It is important to recall that more than one distribution can be compared on a QQ Plot, so the explanation will talk about d different distributions.

To create a QQ Plot it is necessary to go through some necessary steps. The first one is to draw n values from each distribution that will be the object of comparison. It is also necessary to draw n values from the target data, i.e. the data whose probability distribution is the object of study of this work. It is also necessary that every subsample drawnis sorted and divided into k quantiles.

From this point, it is possible to create a scatter plot, which is a mathematical diagram that uses cartesian coordinates to display values of different variables one of those determines the position of the horizontal axis and the other variables determine the position of the vertical axis. Thus, on the horizontal axis there will be the quantiles of the Target Cumulative Data and on the vertical axis the will be the quantiles of the different proposal distributions or Observed Cumulative Data.

Once the plot is obtained, there are different insights that can be obtained by observing it, the most important being that the closest the plotted points from a proposal distribution are to the 45˝line x =y on the plot the better is the fit. On the other hand, if the plotted points

(28)

3.3. Validation Methods

the proposal distribution is not a good fit. It can also be seen that if one subset of consecutive points is in one side of the x=y line and the other subset in the other part, the distribution is skewed. The same intuition can be applied on the fact that if the points in the central part of the plot are close to the line and the corner parts of the plot - bottom-left and top-right - are far from the line, the problem observed is that the tails of the distribution are not properly found.

Pearson’s χ

2

Test

The Pearson’s χ2test is used to accept or reject hypotheses, in the case of continuous random variables it can be used to decide whether a certain model fits the observed data. In this particular case then, the null hypothesis Hoand the alternative hypothesis H1are,

Ho : F=F0

H1: F ‰ F0

(3.30) where, if the null hypothesis H0can not be rejected, it can be assumed that the model fits

the data.

To do so, the first step is to divide the proposal distribution into k quantiles. The dataset then, needs to be divided into the same quantiles. By means of equation 2.8 it is possible to compute the test statistic, which is the sum of all the quantiles of the weighted squared deviations of the observed values from their expected values, which follows a χ2distribution

with k ´ 1 degrees of freedom.

Once the test statistic is obtained, it is compared to the χ2value for the number of degrees of freedom and the chosen level of the test α. If the value of the test statistic is smaller than the χ2value it is not possible to reject H0in favor of H1, i.e. the test can not reject that the

sample comes from the proposal distribution.

Likelihood Ratio Test

The LRT can be used for Comparison of two Non-Nested Models, however there are some things that need to be taken care of. Suppose the two Non-nested models from two different families of distributions A and B, the Q from equation 2.11 quantifies the relative performance of the model of family A over B, or the other way around. From this point, and taking into account model A and model B, the LRT test statistic Q can be defined as,

Q=´2 log(LA LB

) =2(logLB´LA). (3.31)

This results yield 2 questions:

• The test statistic for the LRT under the null hypothesis will not, in general, converge to a χ2distribution.

• It is difficult to decide which one of the two models should be treated as the null model. Even though the resulting value will be unaffected by this, the sign of such value will be the opposite for each case and then the reference distribution obtained will be different. To deal with this problems [Wil70] proposes a model to address this issue. It considers both models as Null models, and then the LRT Q value will be classified in one of 4 different categories.

(29)

3.3. Validation Methods

2. Model B is preferred over model A. 3. Both models are equally good. 4. None of the models is good enough.

This methodology is based on the idea that if it is possible to simulate from model A, being this the null model, it is possible to simulate the reference distribution without the need to derive the mathematical properties. To do this, these are the steps that need to be followed.

1. Generate a large number S of simulation samples from model A.

2. Repeatedly fit each sample using MLE to family A and B Qifrom i= 0 to S, where Qi

is the Q value of each sample.

3. Compare Q to all Qi’s computed. If Q is extreme relative to Qi, the null hypothesis

H0: F=FAis rejected.

It is possible to find an approximate P-value for this LRT, where the p-value is the pro-portion of simulated statistics Qi that are larger in magnitude than Q. The quality of such

method is proportional to the number of S samples from the fitted model. This is a very easy approach mathematically but computationally expensive.

CDF Estimation

The goal of this work is to find the probability that an observation, specifically the Power Output, will be within some given margins. This is equivalent to looking for the probability that an observation will lie outside some limits. To do so, the Delta Method will be the tool used, explained below.

The distribution chosen for this part is the Normal distribution. From this point, it is necessary to find the Cumulative Distribution Function (CDF) of the Normal Distribution. The CDF, represented as F(y), of a random variable Y, is the probability that Y will take a value less or equal than y and is the integral of its probability density function as follows,

FY(y) =

ży

8

fY(t)dt. (3.32)

For the generic Normal Distribution, the integral of equation 3.1 with mean µ and stan-dard deviation σ, is usually denoted as,

Φ(y) = 1 2  1+erf y ´ µ σ ? 2  , (3.33)

where erf stands for the Error Function. The Error Function is a complex function of a complex variable defined as,

erf(z) = 2

π

żz

0

e´t2dt. (3.34)

The Delta Method (DM) states that we can approximate the asymptotic behaviour of functions over random variables, if the random variables are themselves asymptotically normal. Briefly, this theorem states that the expected value and variance of any function g(u)

can be approximated reasonably. In this case the function g(u)stands for the probability that Y will take a value less or equal than y that is function of the random variables µ and σ, while y will be treated as a constant. Therefore, u= (µ, σ)are the constant population parameters.

(30)

3.3. Validation Methods

The target function is,

g(µ, σ) = 1 2  1+erf y ´ µ σ ? 2  . (3.35)

Let 3.35 be the target function, and applying equation 2.13 it is possible to find the distri-bution of the estimated CDF. The gradient can be obtained deriving 3.35 over u. The result is,

∇g(u) = "Bg Bg # =   ´  1 σφ y´µ σ  ´ y´µ σ2  φ y´µ σ   , (3.36)

where φ stands for the standard normal density function.

For the covariance matrixΣ can be obtained under the assumtion that the data is Normally distributed. In particular, Un = (¯yi, Si) are the real random variables and the covariance

matrix is defined as,

Σ= cov(¯yi, ¯yi) cov(¯yi, Si)

cov(Si, ¯yi) cov(Si, Si)



(3.37) From these regards, the covariance between ¯yiand Si, as these are independent between each

other is cov(¯yi, Si) = 0. The variance of the sample mean can be derived from the variance

sum law, and can be defined as,

Var(¯y) = σ

2

n , (3.38)

where n is the number of observations in the dataset. Recall that the sample variance is defined as,

s2=

řn

i=1(yi´¯y)2

n ´ 1 , (3.39)

then using the fact that (n´1)sσ2 2 „ χ2n´1, the variance of the sample variance can be

ob-tained through the following steps,

Var  (n ´ 1)s2 σ2  =Varχ2(n´1)  (n ´ 1)2 σ4 Var(s 2) =2(n ´ 1) Var(s2) = 2(n ´ 1)σ 4 (n ´ 1)2 Var(s2) = 4 (n ´ 1).

Then, the covariance matrixΣ is defined as,

Σ= " σ2 n 0 0 n´14 # (3.40) Finally, combining 2.12 and 2.13, it is obtained,

(g(Un)´g(u))ÝÑL N  0,∇g(u) TΣg(u) n  , (3.41)

(31)

3.4. Workflow Diagram

where n is the number of observations in the sample.

The values u = (µ, σ)can be substituted for the sample mean and sample variance

re-spectively. As mentioned before, the sample mean and sample variance converge to µ and σ as n Ñ 8. Also, using this method it is feasible to find the number of observations necessary to find a value with an uncertainty lower than the permitted in industry processes.

Finally, having the distribution of the probability of an observation being outside the lim-its, it is possible to obtain intervals with the desired uncertainty, i.e. it is possible to have different limits for the tolerances with a probability.

Process Capability Index

To compute the CP it is necessary to have the quantiles of the distribution, the lower and upper limits that will define the borders of the distribution. Previously we obtained the quantiles, and now using formula 2.14 we compute the CP value.

From this point, if the CP value is larger than 1.67, according to Table2.1, the modelling satisfies the Automotive Industry requirements. If, in the other hand, the value is lower, it means that the modelling is too optimistic or wrong, and it can not be used to design safety elements.

3.4

Workflow Diagram

The way that this thesis proposes to work with the data is shown in Fig 3.2. The simulation of the data is not included since the expectation is that this methodology is valid for real world data. Each step is explained in the following pages and this diagram intends to provide a clear picture of the whole work.

(32)

3.4. Workflow Diagram

(33)

4

Results

This chapter will present the results obtained throughout all the experiments. These are ex-pected to provide a quantitative framework that will help to answer the research questions that are the driving force of this work.

4.1

Fitting Methods

In this initial part of the work different statistical distributions that model the behaviour of the data are found. It is important to recall that the dataset used for this purpose consists of 20000 observations and the likelihood is directly linked to the number of data points.

The results obtained in each one of the two different fitting methods studied in this work are the following.

Maximum Likelihood Estimation for the Exponential Family

Recall that the MLE estimators are related to the likelihood of the model applying to a cer-tain dataset. From this regard, the logarithmic likelihood will be the result provided in this work as it is an equivalent measure. For each of the different distributions members of the exponential family the parameters of the distribution will be provided, along with the log likelihood for each one.

Normal Distribution

The parameters obtained using equations 3.3 and the simulated data for the Normal distri-bution are,

ˆ

µMLE=10.8557

ˆσ2MLE=0.2892 (4.1)

And the log likelihood associated to it is calculated using 3.2 and yielding the result,

` =´3566.8835 (4.2)

Figure 4.1 shows the Normal distribution with such parameters (orange line), together with the histogram of the data (light blue).

(34)

4.1. Fitting Methods

Figure 4.1: Normal distribution with calculated MLE parameters.

Gamma Distribution

The parameters obtained using equations 3.7 and the simulated data for the Gamma distri-bution - where the distridistri-bution is shifted loc = ´28.1170 units - and using an optimization algorithm are,

ˆαMLE=18158.8348

ˆβMLE =0.002146212788737483

(4.3) And the log likelihood associated to it is calculated using 3.6 and yielding the result,

` =´3566.5157 (4.4)

Figure 4.2 shows the Gamma distribution with such parameters.

Figure 4.2: Gamma distribution with calculated MLE parameters.

Beta Distribution

The parameters obtained using equations 3.11 and the simulated data for the Beta distribution - with loc=´2.9409 and scale=35.1571 - and using an optimization algorithm are,

ˆαMLE=1382.2563

ˆβMLE =2140.0621

(4.5) And the log likelihood associated to it is calculated using 3.10 and yielding the result,

` =´3566.5134 (4.6)

(35)

4.1. Fitting Methods

Figure 4.3: Beta distribution with calculated MLE parameters.

LogNormal Distribution

The parameters obtained using equations 3.14 and the simulated data for the LogNormal distribution - where the distribution is shifted loc=´11.9149 units - are,

ˆ

µMLE=22.768545993004153

ˆσ2MLE=0.012696297372284948 (4.7)

And the log likelihood associated to it is calculated using 3.13 and yielding the result,

` =´3567.4306 (4.8)

Figure 4.4 shows the LogNormal distribution with such parameters.

Figure 4.4: LogNormal distribution with calculated MLE parameters.

Exponential Distribution

The parameters obtained using equation 3.17 and the simulated data for the Exponential distribution - where the distribution is shifted ´9.7639 units - are,

λMLE=1.0918 (4.9)

And the log likelihood associated to it is calculated using 3.16 and yielding the result,

` =´21757.1133 (4.10)

(36)

4.1. Fitting Methods

Figure 4.5: Exponential distribution with calculated MLE parameters.

Method of Moments

Normal Distribution

The parameters obtained using equation 3.20 and the simulated data for the Normal distri-bution are,

ˆ

µMoM=10.8557,

ˆσMoM2 =0.2892 (4.11)

And the log likelihood associated to it is calculated using 3.2 and yielding the result,

` =´3553.5776 (4.12)

Figure 4.6 shows the Normal distribution with such parameters.

Figure 4.6: Normal distribution with calculated MoM parameters.

Gamma Distribution

The parameters obtained using equation 3.23 and the simulated data for the Gamma distri-bution - where the distridistri-bution is shifted loc=9.7639 units - are,

ˆαMoM=14.2520

ˆβMoM=0.0766

(4.13)

And the log likelihood associated to it is calculated using 3.6 and yielding the result,

` =´4291.1321 (4.14)

References

Related documents

Some known mixed regions (boundary layer) are observed and identified: Magnetosheath transition layer — the region outside the magnetopause which has both lower density

Figure 6: Sample autocorrelation function of women unemployment data, for differ- ent time

De gäster som anländer för tidigt till Respondent A restaurang meddelas detta på ett glatt och trevligt sätt och rekommenderas utefter situation och person att återkomma

In Murto [10], the cost of making the investment is as- sumed to decrease by a given fraction each time a Poisson process (independent of the Brownian motion driving the cash

2 The result shows that if we identify systems with the structure in Theorem 8.3 using a fully parametrized state space model together with the criterion 23 and # = 0 we

Fuzzy cluster analysis is a method that automatically measures the volumes of grey and white matter as well as the volume of e.g.. It does so by classifying every image volume

As noted earlier, World Bank (1993) observes that most societies in practice seem to attach higher values to a year of life of young and middle aged adults than to a year

Measurement related to acoustic properties of Porous Material (in order to use as input in the VA One software) by Impedance Tube and based on a method presented by University