• No results found

Analyzing European National Accounts Data for Detection of anomalous observation

N/A
N/A
Protected

Academic year: 2021

Share "Analyzing European National Accounts Data for Detection of anomalous observation"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Örebro University

Örebro University School of Business Applied Statistics, Master thesis, 30 hp Supervisor: Laitila Thomas

Examiner: Sune Karlsson Spring Semester/2014

Analyzing European National Accounts Data for Detection of anomalous

observation

(2)

Abstract

In the past few years, there has been evidence that anomalous data exists in the budget deficit data of Greece. Detection of these data is thus meaningful. The purpose of this study is to detect anomalous data in the European Union memberships. Outlier detection is one of the most important techniques of applied statistics. In this paper, outlier detection based on robust within group generalized M-estimation (WGM) of fixed effect panel data is applied to the government budget deficit model of EU. There exist some differences among the results of least trimmed squares estimation, the traditional within group estimation and WGM, but not much. This maybe because that the proportions of the outliers are not that large. Then the last year in the dataset is omitted and the model based on the left 11years is estimated. Compared with the results derived from the whole dataset, the outliers in the omitted year are detected.

(3)

Contents

1. Introduction ... 1

1.1 Background ... 1

2 Outliers Detection ... 3

2.1 Methods of Outliers Detection ... 3

2.2 Outliers in Panel data ... 6

3. Outliers Detection Based on Within Group Generalized M-approach ... 7

3.1 Within Group Generalized M-estimation ... 7

3.2 Outliers Detection Method ... 12

4. Application in Budget Deficit Model ... 13

4.1 Results of Estimation ... 15

4.2 Diagnose Outliers outside the Sample ... 20

5. Conclusion ... 22

Appendix ... 24

(4)

1. Introduction

1.1 Background

In the last few years, the problem of accounting fraud in national accounts of European Union memberships has become a matter of concern. In 2010, Greece was accused of fraud statistics in government budget deficit data. As stated in the report of European Commission in 2010, Greece misreported the government deficit and budget data mainly in 2008 and 2009 in the period of 2005 to 2009. The Greek government deficit of 2008 was revised from 5% to 7.7% of GDP. The figure of 2009 was revised from 3.7% to 12.5% of GDP. Also the Greek figures were significantly revised in the 2004 government deficit database. The government deficit and budget data is an important reference for fiscal policy and monetary policy. It also has important references to the investment and private investment. Thus the misreport of the government deficit data would cause severe problems. It is crucial to detect the anomalous data in the national accounts of the European Union memberships.

It is not easy to detect accounting fraud since the anomalous macroeconomic data may due to several causes, such as change of economic circumstance, accounting skullduggery and political unrest. Rauch et.al. (2011) used Benford’s law to detect accounting fraud in Greece deficit data and found Greece shows the most deviation from the Benford’s law. Benford’s law is a digit distribution which indicates that the number one appears in the first digit with more probability than other numbers. This distribution was found an effective way to detect

(5)

2 the data are truncated.

The primary budget deficit is the amount by the government spending exceeds the government revenue. The total deficit includes also the debt. In 2009, European Union suffered a severe sovereign debt crisis. The economic crisis leads to massive loss in EU countries. “EU real GDP is projected to shrink by some 4% in 2009, the sharpest contraction in its history.” (Economic Crisis in Europe: Causes, Consequences and Responses, 2009) Greece is one of the memberships of EU. One of the possible domestic causes of crisis is high government spending and low government revenues. (Nelson et. al., 2010). Kurt et. al., (2012) included a dummy variable for economic crisis in the panel model to analyze economic crisis effect on EU.

The purpose of this paper is to detect outliers in the EU budget deficit dataset. In this study, the detection of anomalous data will base on the robust estimation of panel data. The basic model will include independent variables such as total government expenditure, taxes, government fixed investment and inflation like Kurt et. al., (2012) did. In this paper, the outliers detection based on WGM estimation in (Bramati and Croux, 2003) will be used to analyze the government statistics of European Union memberships. In section 2 there is the brief review of previous methods. In section 3, the robust fixed effect panel data model will be illustrated. The application of the model and the results of the outliers detection will be stated in section 4. In section 5 will be the conclusion.

(6)

2 Outliers Detection

2.1 Methods of Outliers Detection

In the past decades, several methods have been proposed to detect outliers. In detecting multivariate outliers, Rousseeuw and Zomeren (1990) provided a test based on robust distance which substitutes the arithmetic mean and sample covariance with robust minimum volume ellipsoid (MVE) estimators in the Mahalanobis distance. For normally distributed p-dimensional data, their Mahalanobis distance is chi-squared distributed with p degrees of freedom. The points with large Mahalanobis distances are considered as outliers. Rousseeuw and Zomeren (1990) thus determined that the robust distances that exceed the critical value √𝜒𝑝,0.9752 are diagnosed as outliers. Robust means that the estimation tends to be less

distorted by the outliers. It is obvious that for the estimates of location, median is more robust than mean. Since the estimation of the mean includes all the outliers and is likely to be distorted. Another robust estimation of location and scale minimizes covariance determinant (MCD) computed with FAST-MCD algorithm (Rousseeuw and Driessen, 1999). Another method of detecting multivariate outliers is projection pursuit ( ̃ and Prieto, 2001).

For time series data, Tsay et al. (2000) extended the outlier detection method for univariate time series in (Tsay , 1988) to multivariate time series. The test is based on the vector ARIMA model and designed to detect four types of outliers: additive outlier (AO), innovational outliers (IO), level shifts (LS) and temporary changes (TC), which are proposed in the extension work of Fox (1972). Muirhead (1986) compared the likelihood ratio rule and

(7)

4

(observation) outliers and Type II (innovation) outliers. Further details can be found in (Tsay et. al., 2000) and (Muirhead, 1986).

In linear regression models, Rousseeuw and Leroy (1987) defined three types of outliers which are vertical outliers, good leverage points and bad leverage points. To illustrate, consider a simple regression model:

, , , (1) A vertical outlier is a point that is outlying in the y-direction. A bad leverage point is outlying in the explanatory variables. A good leverage point is a point which is lying near the line , but far away from the regular points. This type of outliers does not affect the estimation of the coefficients in the model.

In the case of outliers, there is a risk of having distorted ordinary least squares coefficients estimates. The OLS estimator is to find the coefficients that minimize the sum of the square of residuals. To conquer the distortion, Rousseeuw (1984) introduced a robust estimation for regression model which is least trimmed square method. This estimation minimized the sum of the smallest square of residuals. In formal, it is defined as

̂ ∑ ) 2

(2)

Where ) 2 )2 2 ) 2 .

However, the choice of h is arbitrary and is only tolerate up to 25% outliers. Huber (1981) proposed a robust M-estimator that minimizes the sum of a function of residuals, i.e.

(8)

̂ ∑ ) (3) Where ̂ is the estimated residual. ) ) is a symmetric function with a unique minimum at x=0. A wide choice of ) is Tukey’s biweight function (Beaton and Tukey, 1974):

) {2 2 , | |

, | | (4) The estimator of can be obtained by differentiate the function, ∑ ).

) (5) Where ) ). That is the first order condition to the optimization problem in equation (3).

However, this kind of estimator is not robust to leverage points. Mallows (1975) proposed the generalized M-estimator which downweights the leverage points in the estimation.

) ) (6) The standard way of displaying outliers is the scatter plot of studentized residuals ̂ against the robust Mahalanobis distance of the explanatory variables, . ̂ is the scale estimator of the residuals. According to Rousseeuw and Leroy (1987), the regular observations have small and small ̂. The vertical outliers are detected for small and large ̂. The good leverage points are diagnosed for large and small ̂. The bad leverage points correspond to large and large ̂. The cutoffs of the studentized residuals in (Rousseeuw and Leroy, 1987) are . . And the critical value of the robust Mahalanobis distances is √𝜒𝑝,0.9752 , as stated before.

(9)

6 2.2 Outliers in Panel data

As stated in (Bramati and Croux, 2003), despite the vertical outliers and leverage points, another particular circumstance of outliers also deserves analysis. That is the concentrated contamination which is also called block outlier. In panel data, there is a circumstance that more than one outlier appear in one block while there is no outlier in other countries. A block outlier might contain both vertical outliers and leverage points.

While plenty of literatures studied the outlier robust regression model, few literatures study robust estimation of the panel data model. Wagenvoort and Waldmann (2002) compared two-step generalized M-estimation (2SGM) with robust general method of moment (RGMM). They concluded that the RGMM method is more efficient than 2SGM if the residuals showed heteroskedasticity and autocorrelation. However, both the 2SGM estimator and the RGMM estimator are only consistent if the number of panels goes to infinite, i.e. the number of panel N is much larger than the time period T.

Bramati and Croux (2003) compared two robust within group estimation for fixed effect panel data model, one within group generalized M-estimator (WGM) and within group MS estimator (WMS). Both estimations coped well with the outliers. These two methods only require the sample size in the pooled panel dataset to be large enough.

Lucas et.al. (1996) proposed a robust GMM estimation for linear dynamic panel data. The inner idea of downweighting the residuals and instrument variables is similar to the WGM

(10)

method.

3. Outliers Detection Based on Within Group Generalized M-approach

3.1 Within Group Generalized M-estimation

Consider the fixed effect linear panel data model:

, , , , , , (7)

Where i is the cross-section dimension, t is the time series dimension. is the regressor.

, , 𝑝 , , ) is the explanatory variables with dimension. is the

time-invariant fixed effect within each group. , which is of dimension is the coefficient of the model. is the error term which is uncorrelated through both time and

cross-section. In this paper, heteroscedasticity and correlation of the error term are not considered. The error terms are assumed uncorrelated over time and cross-sections. And the explanatory variables are considered exogenous.

The model could also be presented in matrix form.

(8) Where , 2, , ) is a vector of dimension. , , ) is a vector with all the elements equal to 1. , , ) is an vector. , 2 , ) is an dimensional matrix. , 2, , ) is the residual vector. And denotes the Kronecker product.

(11)

8

group (WG) estimator is a widely used way to estimate fixed effect panel data model. The first step of classical within group (WG) estimator is to demean the data. Let ̃

∑ , ̃ ∑ . Then the fixed effect is eliminated in the estimation of the

parameter . The model to be estimated then turns into

̃ ̃ ̃ (9)

In classical within group estimator (WG) then uses ordinary least square method to estimate the parameter .

̂𝑊𝐺 ̃ ̃)− ̃ 𝒚̃ (10)

This method greatly reduces the number of parameter. Whereas the demean step and OLS estimation will suffer the masking effects of outliers. The masking effect means that the parameters are distorted so that the outliers are less outlying to some extent and unable to be detected.

The WGM method in (Bramati & Croux, 2003) is an analogy to the classical WG method. To eliminate the fixed effect, the first step is subtracting the median within group. Bramati & Croux (2003) chose median instead of mean because the mean might be distorted by the outliers so that the final estimation is distorted.

̃ d ,

̃ 𝑝 𝑝 d 𝑝 , 𝑜𝑟 𝑝 , , 𝑃 (11) The WGM estimation is completed by downweighting both vertical outliers and leverage points. The vertical outliers correspond to large residuals. In this estimation, initial residuals

(12)

based on initial estimation are used to obtain the weights 𝑊𝑟. The leverage points correspond to large robust Mahalanobis distances. 𝑊 is appended in the estimation to downweight the points with large distances. The final estimation is thus

̂𝑊𝐺𝑀 ̃ 𝑊𝑊𝑟 ̃)− ̃ 𝑊𝑊𝑟𝒚̃.

The least trimmed square estimation described in (Rousseeuw, 1984) is used as an initial value of the WGM method. The residuals based on the initial value can be used to compute the weights so that the vertical outliers are downweighted.

̂𝐿 𝑆 ∑ ̃ ̃ ) )2 (12)

Where ̃ ̃ ) )2 ̃ ̃ )2 )2 ̃ ̃ ) ) )2 are ordered square of residuals. Here . The residuals based on this estimator are derived for the following generalized M-approach.

The residuals of the LTS estimated model are denoted ̃ ̃ ̂𝐿 𝑆.In (Bramati & Croux, 2003), the robust scale estimator of the LTS residuals is defined as ̂𝐿 𝑆2

𝐿 𝑆 ∑ ̃ ̃ ) )2 , where 𝐿 𝑆 is a constant that makes ̂𝐿 𝑆2 a consistent

estimator for 2 2) of normal distribution.

Based on the LTS residuals, Bramati & Croux (2003) introduced a weight matrix 𝑊𝑟 to downweight the vertical outliers. The weight matrix 𝑊𝑟 is a diagonal matrix with elements 𝑊𝑟) ̂

)

̂ ). is the LTS residual. ) is the Tukey’s

(13)

10

) {2 2 , | |

, | | (13) where c=4.685, which is the standard choice of c according to Wagenvoort & Waldmann (2002).

Thus, the diagonal elements can be expressed as

𝑊𝑟) { ̂ ) 2)2, | ̂ | < , | ̂ | ≥ (14)

For leverage points, Bramati & Croux (2003) introduced the weight matrix 𝑊 based on robust Mahalanobis distance to the estimation of the parameter. The robust Mahalanobis distance of the independent variables is defined as

√ ̃ ̂) ̂− ̃ ̂) (15)

Where ̂ and ̂ are minimum covariance determinant (MCD) estimators of the location and scatter estimation of the independent variables. The MCD draws a subset of size

+ +

2 with the smallest determinant of the covariance matrix from the NT explanatory

variables. To explain the process of the algorithm of the estimators, theorem 1 is introduced. The details of the proof of the theorem can be looked up in (Rousseeuw & Driessen, 1999).

Theorem 1. For the subset 𝐻 of size + + 2 , ̂ ̃ , ̂ ∑ ̃ ̂ ) ̃ ̂ ) are the location and scale estimators. If ̂ is nonsingular, the corresponding Mahalanobis distances would be √ ̃ ̂ ) ̂− ̃ ̂ ) . Sort the distances and

get another subset 𝐻2with h’ smallest distances and compute the scale estimator ̂2 and location estimator ̂2 the same way as ̂ and ̂ , but using subset 𝐻2. The determinant of

(14)

̂2 is no larger than that of ̂, d ̂2) d ̂ ), with equality if and only if ̂2 ̂ and ̂2 ̂.

The estimators are computed with the Fast-MCD algorithm proposed by Rousseeuw and Driessen (1999). Based on theorem 1, the purpose of the Fast-MCD algorithm is to compute the location and scale estimators with lowest determinant of scale estimator. The process will select a subset of the sample corresponding to smallest Mahalanobis distance. As stated in section 2.1, the points with large Mahalanobis distances are outliers. The MCD estimators will be unaffected by the points with large Mahalanobis distances. The Fast-MCD algorithm optimized the MCD algorithm in the creating of initial subset 𝐻 and efficiency.

The diagonal elements of weight matrix are defined as 𝑊) ,

, . ) by

Bramati & Croux (2003), where 𝜒 ,0.9752 is a 97.5% quantile of chi-square distribution with P degree of freedom. The weight matrix downweights the observations that are outlying, i.e. the distance exceeds the cutoff point √𝜒 ,0.9752 .

The within group generalized M-estimator of the parameter is now performed as

̂𝑊𝐺𝑀 ̃ 𝑊𝑊𝑟 ̃)− ̃ 𝑊𝑊𝑟𝒚̃. (16)

The standard error is computed the same way in (Bramati and Croux, 2003):

𝐶𝑜𝑣̂ ( ̂𝑊𝐺𝑀) ̂2𝐿 𝑆 ̃ ̃)− ̃ 2 ̃) ̃ ̃)− (17)

(15)

12 ) 𝑊) "

̂ ), 𝑜𝑟 , , 𝑁, 𝑡 , , 𝑇 (18)

Where is the LTS residual, ) is Tukey’s biweight function as stated before. ) is the second derivative of ). The diagonal elements of 2 are

2) 𝑊) 𝑊2( ̂ ) ( ̂ ) 2 , 𝑜𝑟 , , 𝑁, 𝑡 , , 𝑇 (19)

Where W ) ) ⁄ . ) is the first derivative of ).

3.2 Outliers Detection Method

To detect vertical outliers and leverage points based on the robust WGM estimation, the scatter plot with the standardized residual against robust distance of the independent variables will be displayed as a diagnosis of the outliers.

The fixed effect in model (2.1) can be estimated with function,

̂ d ̂𝑊𝐺𝑀) (20)

(Bramati & Croux, 2003). The residuals based on robust WGM estimation is thus 𝑟

̂𝑊𝐺𝑀 ̂. In this paper, the standardized residuals are computed using the method in

(Rousseeuw and Leroy, 2005). First derive the preliminary scale estimate

𝑠0 .48 6 )√𝑚 𝑑 𝑟 2) (21)

Based on the preliminary scale, derive the preliminary weight

{ , |

𝑟 𝑠0

⁄ | .

, 𝑜𝑡ℎ 𝑟 𝑠 (22) Then the scale estimate of the residuals is thus σ √∑ ∑ 𝑤 𝑟

∑ ∑ 𝑤 − . The standardized error is

(16)

standardized error |𝑟 ⁄ | exceeds 2.5, the observation is diagnosed as an outlier. σ

The robust Mahalanobis distance of the explanatory variables is also used as an aid to detect leverage points. If the robust distance exceeds the cutoff, √𝜒 ,0.9752 , the observation is considered as an outlier.

Rousseeuw and Zomeren (1990) inducted the classification of the outliers and regular observations. The regular observations correspond to small and small 𝑟 ⁄ . The σ vertical outliers have small and large 𝑟 ⁄ . The good leverage points have large σ

and small 𝑟 ⁄ . And the bad leverage points have large σ and large 𝑟 ⁄ . The results σ can be seen in a scatter plot with 𝑟 ⁄ against σ .

As stated in section 2.2, the block outliers also need to be observed. Bramati and Croux (2003) used the weights in the estimation of the parameter to compare the outlyingness of different groups. Let W d 𝑊 𝑊𝑟) be the weight vector with dimension . For each group, compute the mean weight ∑ 𝑊) . The weight indicates to which extent the

group i is downweighted in the model. Thus it is a reasonable indicator for detection of block outliers. There is no cutoff for this indicator. However, the indicators are comparable among the groups.

4. Application in Budget Deficit Model

In this paper, the WGM estimation is used to estimate fixed effect panel data model for the budget deficit in European Union. Kurt et. al. (2012) investigated the effect of economic

(17)

14

crisis on budget deficit in Europe. The model in this paper is similar to the model in (Kurt et. al., 2012), but omitting the crisis dummy variable and total government revenue.

In (Kurt et. al., 2012), The basic model is constructed as

𝐵 𝐺 2𝑇𝐴𝑋 𝐺𝐹𝐼 5𝐼𝑁𝐹 5𝐺 (23)

, Where BD, which is the abbreviation of budget deficit, indicates the government budget surplus or deficit. GE indicates the total government expenditure. TAX indicates the taxes on production and imports. GFI is the government fixed investment. INF is the inflation rate. And GD indicates the government debt.

However, budget deficit is a major determinant of government debt according to Fischer & Easterly (1990). In (Fischer & Easterly, 1990), “the change in debt ratio equal to the noninterest primary deficit minus the money financed by the government with printing money plus the current debt ratio times the real interest rate minus the growth rate of GNP.” Thus, there exists endogenous variable in the model.

In (Wooldridge, 2010) stated that endogeneity in explanatory variables would cause the inconsistent of the OLS estimator and the heteroskedasticity of the error term. The heteroskedasticity of the error term will lead to the invalid test of the model parameters. In robust estimation, the estimation of the standard error of the parameter is robust to the heteroskedasticity error terms (Bramati & Croux, 2003). The robust parameters estimated are likely to be distorted.

(18)

In this paper, the main purpose is to detect outliers in panel data, not to derive consistent estimators. Thus, the endogenrity problem is not solved in this paper. A commonly used way to solve the endogeneity problem is to introduce instruments variables into the model.

All the variables except inflation rate were derived from European Commission AMECO database, where the Greece data have been revised in the database. The units of these variables are percentage of gross domestic product (GDP) at market prices. The inflation rate was obtained from the World Bank. The data are in the period from 2001 to 2012.

4.1 Results of Estimation

The estimated coefficients are shown below. In the parentheses are the std. errors of the parameters. In the third column is the t-statistics for testing the significance of the coefficients. In the last column is the p-values of the tests. The parameters are regarded significant if p is smaller than 0.05.

From the results, it can be seen that the coefficients of government debt and government fixed investment are not statistically significant. The government expenditure has negative effects on government budget deficit. The increasing government expenditure tends to increase the fiscal deficit of the general government. The increase of inflation tends to increase the net lending of the government. TAX, as the main resource of the government

(19)

16

revenue has a positive effect on the budget balance. In the last two columns, there display the LTS estimation and the traditional within group estimation. The differences among all three methods are small. The results of classical WG estimation and LTS estimation are close to each other. The difference of the coefficient of government fixed investment between these two estimations and the WGM estimation are greater.

Table 1 Estimation of Parameters

𝑊𝐺𝑀 t-statistics p-value 𝐿 𝑆 𝑊𝐺 GE -0.842(0.044) -19.04 0.00 -0.878(0.031) -0.888 (0.033) TAX 0.969(0.124) 7.788 0.00 0.949(0.089) 1.119(0.100) INF 0.048(0.025) 1.929 0.027 0.02 (0.022) 0.035(0.027) GFI 0.024(0.119) 0.202 0.42 0.168(0.1) 0.151(0.122) GD 0.011(0.008) 1.371 0.086 0.009 (0.007) 0.028 (0.008)

Estimated fixed effects of each country are presented in table 2. The derivation of fixed effects is helpful for computing residuals to detect vertical outliers. s are computed with equation (20) in section 3.2.

Table 2 Estimation of Fixed Effect

country country

Austria 25.70 Latvia 16.91 Belgium 27.63 Lithuania 16.17 Bulgaria 15.97 Luxembourg 23.35

Cyprus 17.20 Malta 17.91

Czech Republic 21.49 Netherlands 25.42 Denmark 29.58 Poland 18.25 Estonia 18.29 Portugal 18.99 Finland 31.22 Romania 14.52

(20)

Table2,continued

France 25.93 Slovakia 16.66 Germany 25.41 Slovenia 20.95 Greece 19.19 Spain 20.24 Hungary 19.99 Sweden 29.11 Ireland 16.07 United Kingdom 19.80

Italy 22.47

In plot 1 is the scatter plot of all countries with standardized residuals against robust distances of the explanatory variables. It can be seen that most observations are regular with small standardized residuals and small robust distances.

In plot 2 displays the scatter plot of LTS estimation. It can be seen that the residual-distance plot of WGM estimation diagnoses more outliers than LTS estimation. To observe more detail the scatter plot of each country is displayed in plot 3.

In plot 3 displays the residual-distance plot of each country. It can be seen that many countries contain more than one outlier while others contain none. This is the sign of block outliers. There contain more than one outlier in Bulgaria, Cyprus, Czech Republic, Estonia, Greece, Hungary, Ireland, Latvia, Lithuania, Luxembourg, Malta, Poland, Portugal, Romania, Slovakia, Slovenia, Spain Sweden and United Kingdom while there contains no outlier in Belgium, Finland, France, Germany, Italy and Netherlands. It is still hard to detect which year of data is anomalous within a country in this plot. Thus the detail of the results will be provided in table 7 in the appendix.

(21)

18

Figure 1 Residual Distance Plot of WGM estimation

Figure 2 Residual Distance Plot of LTS estimation

0 5 10 15

-5

0

5

10

Robust Distance RD(x) computed from the MCD

S ta n d a rd ize d R o b u st R e g re ssi o n R e si d u a l 0 5 10 15 -2 0 2 4 6

Robust Distance RD(x) computed from the MCD

S ta n d a rd ize d R o b u st R e g re ssi o n R e si d u a l

(22)

Figure 3 Residual-Distance Plot For Each Country

Table 3 Weight of Each Country in WGM estimation

country weight country weight

Austria 0.97 Latvia 0.68 Belgium 0.96 Lithuania 0.85 Bulgaria 0.61 Luxembourg 0.93 Cyprus 0.85 Malta 0.93 Czech Republic 0.92 Netherlands 0.94 Denmark 0.97 Poland 0.93 Estonia 0.81 Portugal 0.88 Finland 0.99 Romania 0.63 France 0.99 Slovakia 0.94 Germany 0.99 Slovenia 0.95 Greece 0.85 Spain 0.86 Hungary 0.76 Sweden 0.83 Ireland 0.72 United Kingdom 0.89 Italy 0.96 -5 0 5 10 -5 0 5 10 -5 0 5 10 -5 0 5 10 -5 0 5 10 0 .5 1 1.5 0 .5 1 1.5 2 2 4 6 8 1 2 3 4 5 2 3 4 5 6 0 1 2 3 2 3 4 5 6 1 2 3 4 0 1 2 3 .5 1 1.5 2 2.5 0 2 4 6 8 1 2 3 4 5 0 5 10 15 0 1 2 3 0 5 10 15 2 4 6 1.5 2 2.5 3 3.5 1 2 3 4 .5 1 1.5 2 2.5 1 2 3 4 5 2 4 6 8 0 5 10 15 20 1 2 3 4 1 2 3 4 0 2 4 6 8 0 1 2 3 0 2 4 6

Austria Belgium Bulgaria Cyprus CzechRepublic Denmark

Estonia Finland France Germany Greece Hungary

Ireland Italy Latvia Lithuania Luxembourg Malta

Netherlands Poland Portugal Romania Slovakia Slovenia

Spain Sweden UnitedKingdom

St a n d a rd ize d R o b u st R e g re ssi o n R e si d u a ls

Robust Distance RD(x) computed from the MCD

(23)

20

In table 3, there display the weights of each country used in the estimation of the parameters, i.e. the average 𝑊𝑊𝑟 over time for each country. The smaller weights correspond to the

countries that are more outlying to most countries and hence downweighted in the estimation. The weights which are closer to 1 correspond to the countries that are less downweighted, i.e. the countries follow the model closely. Bulgaria, Cyprus, Greece, Hungary, Ireland, Latvia, Lithuania, Romania and Sweden are more downweighted than other countries. The countries like Finland, France, Germany, Austria and Belgium are assigned to weights that are close to 1. That is to say, these countries fit the model well.

In table 6 in the appendix, there display the details of the detection of the outliers. In the table, e indicates the standardized residuals derived based on the WGM estimation, rd is the robust Mahalanobis distance of each observation. As stated before, the standardized residual is regarded outlying if the absolute value exceeds 2.5. The robust distance is outlying if the distance exceeds the cutoff, √𝜒 ,0.9752 . wi is the indicator that equals to 1 if the standardized residual exceeds 2.5 and equals to 0 otherwise. wl is the indicator the equals to 1 if the robust distance exceeds the cutoff and equals to 0 if not.

4.2 Diagnose Outliers outside the Sample

To investigate whether an outlier outside the sample could be diagnosed using the model based on history data, the data of year 2012 were omitted and the model was estimated based on the data from 2001 to 2011. The estimated parameters are shown in table 4. The

(24)

parameters do not differ much from the results in table 1. Figure 2 shows the result of outlier detection. Greece is diagnosed as bad leverage point. Sweden is diagnosed as vertical outlier. Czech Republic, Finland, Hungary, Ireland, Latvia, Poland, Portugal, Romania, Slovenia, Spain and United Kingdom are diagnosed as good leverage points. All the outliers in the previous results have been recognized. Outliers detection outside the sample can be regarded as a reference. Of course, the model needs to be updated to keep up with the changing economic circumstances.

Table 4 Estimation based on 2001-2011

𝑊𝐺𝑀 t-statistics p-value GE -0.875(0.039) -22.408 0.00 TAX 0.833(0.176) 4.729 0.00 INF 0.043(0.026) 1.636 0.051 GFI 0.095(0.12) 0.791 0.215 GD 0.009(0.012) 0.719 0.236

Figure 4 Outlier Detection of a New Year

2 4 6 8 10 12 -3 -2 -1 0 1 2 3

Robust Distance RD(x) computed from the MCD

S ta n d a rd ize d R o b u st R e g re ssi o n R e si d u a l

(25)

22 5. Conclusion

In 2009, the budget deficit data of Greece is detected anomalous. And the data of Greece national accounts were revised to large extent. In this study, the budget deficit data of the 27 countries which are memberships of European Union were modeled and outliers based on the fixed effect panel data model were detected.

The observations with large residuals are more likely to be fraud in budget deficit. That is to say, the vertical outliers and bad leverage points should be paid more attention to in fraud detection. The observation which corresponds to large residuals indicates that the budget deficit does not follow the regular economic mechanism as other observations do. Although the good leverage points behave abnormal in independent variables, the budget deficit shows regularity corresponding to the independent variables.

There are many causes of outliers in macroeconomic dataset such as accounting fraud, change of economic circumstance and political change. Although there is no effective method to detect accounting fraud for specific, it is still helpful to detect outliers based on robust estimation.

The shortage of the study in this paper is that the outliers detection is based on the static panel model. Nowadays, economists and governments emphasize on the dynamics of the macroeconomic data for the purpose of summary the long-term relationship among the data. Lucas et.al. (1996) proposed a robust GMM estimation for linear dynamic panel data. The

(26)

inner idea of downweighting the residuals and instrument variables is similar to the WGM method. This method however assumes N to be large enough, and the estimation is consistent for → ∞. In this study, there contains only 27 memberships of the European Union. This method thus is not applicable.

On the other hand, the WGM method pooled the countries and to large extent increased the sample size. The estimation is consistent for → ∞. The future study would focus on the robust estimation of the dynamic linear panel data model for small N.

(27)

24 Appendix

Table 5 Description of Data

Variables Data Unit Source

BD Net lending (+) or net borrowing (-): general government

Percentage of GDP at market prices

AMECO

GD General government debt Percentage of GDP at

market prices

AMECO

GE Total government expenditure Percentage of GDP at market prices

AMECO

TAX Taxes on production and imports Percentage of GDP at market prices

AMECO

GFI Government fixed investment Percentage of GDP at market prices

AMECO

INF Inflation Rate The World

Bank

Table 6 Details of Outliers Detection

country y wi wl e rd country y wi wl e rd Austria 2001 1 0 2.68 0.18 Italy 2001 0 0 -0.59 0.73 Austria 2002 0 0 1.18 0.75 Italy 2002 0 0 -1.22 1.60 Austria 2003 0 0 0.98 0.57 Italy 2003 0 0 -0.54 0.84 Austria 2004 0 0 -0.04 1.56 Italy 2004 0 0 -1.01 0.55 Austria 2005 0 0 -0.25 0.60 Italy 2005 0 0 -1.83 0.44 Austria 2006 0 0 -0.34 1.29 Italy 2006 0 0 -0.85 0.91 Austria 2007 0 0 0.04 1.54 Italy 2007 0 0 0.59 0.69 Austria 2008 0 0 0.43 1.16 Italy 2008 0 0 1.25 0.76 Austria 2009 0 0 -0.67 0.61 Italy 2009 0 0 1.24 1.60 Austria 2010 0 0 -0.88 0.77 Italy 2010 0 0 0.54 1.78 Austria 2011 0 0 -0.41 1.04 Italy 2011 0 0 0.65 1.62 Austria 2012 0 0 0.14 1.10 Italy 2012 0 0 1.28 2.72 Belgium 2001 0 0 0.80 2.01 Latvia 2001 0 1 -1.45 8.39 Belgium 2002 0 0 0.76 1.25 Latvia 2002 0 1 -0.32 7.64 Belgium 2003 0 0 2.00 0.28 Latvia 2003 0 1 -1.51 4.53

(28)

Table 6, Continued Belgium 2004 0 0 -0.05 0.55 Latvia 2004 0 0 0.38 2.72 Belgium 2005 0 0 -0.23 1.24 Latvia 2005 0 1 0.09 4.62 Belgium 2006 0 0 -0.27 1.04 Latvia 2006 0 1 1.93 4.65 Belgium 2007 0 0 -0.56 1.42 Latvia 2007 0 1 -0.17 12.92 Belgium 2008 0 0 0.05 1.03 Latvia 2008 0 1 0.19 8.21 Belgium 2009 0 0 -1.43 1.81 Latvia 2009 0 1 -0.82 4.92 Belgium 2010 0 0 -0.83 1.19 Latvia 2010 0 1 0.00 4.77 Belgium 2011 0 0 0.23 1.36 Latvia 2011 0 1 0.00 3.70 Belgium 2012 0 0 1.10 2.05 Latvia 2012 0 1 0.65 3.72 Bulgaria 2001 1 1 5.89 6.36 Lithuania 2001 0 1 -0.35 5.55 Bulgaria 2002 1 1 3.51 5.10 Lithuania 2002 0 0 -0.73 3.57 Bulgaria 2003 0 0 2.08 3.31 Lithuania 2003 0 1 -0.70 3.63 Bulgaria 2004 0 0 2.10 2.65 Lithuania 2004 0 0 -0.22 1.25 Bulgaria 2005 0 0 -0.48 3.13 Lithuania 2005 0 0 0.81 2.94 Bulgaria 2006 1 1 -2.92 4.31 Lithuania 2006 0 0 0.95 3.27 Bulgaria 2007 0 1 1.41 6.27 Lithuania 2007 0 1 0.80 6.04 Bulgaria 2008 0 1 0.48 6.90 Lithuania 2008 0 1 0.70 6.36 Bulgaria 2009 0 1 -0.66 3.79 Lithuania 2009 0 1 1.03 5.90 Bulgaria 2010 1 1 -2.77 3.62 Lithuania 2010 0 0 0.22 3.11 Bulgaria 2011 1 0 -2.96 1.72 Lithuania 2011 0 0 -1.19 3.51 Bulgaria 2012 0 0 -1.86 2.43 Lithuania 2012 0 0 -0.56 2.37 Cyprus 2001 0 1 -0.70 4.75 Luxembourg 2001 0 1 2.23 3.71 Cyprus 2002 0 1 -1.49 4.22 Luxembourg 2002 0 0 1.27 2.72 Cyprus 2003 1 0 -2.68 2.23 Luxembourg 2003 0 0 -0.13 2.35 Cyprus 2004 1 0 -2.71 3.22 Luxembourg 2004 0 0 -2.01 1.70 Cyprus 2005 0 0 -0.11 1.19 Luxembourg 2005 0 0 -1.81 1.58 Cyprus 2006 0 0 0.11 1.99 Luxembourg 2006 1 0 -2.54 2.47 Cyprus 2007 1 1 2.51 4.28 Luxembourg 2007 0 0 -1.77 2.81 Cyprus 2008 0 1 1.39 3.74 Luxembourg 2008 0 0 0.51 3.47 Cyprus 2009 0 1 0.19 3.61 Luxembourg 2009 0 0 1.48 2.67 Cyprus 2010 0 0 0.68 2.19 Luxembourg 2010 0 1 -0.06 3.61 Cyprus 2011 0 0 0.28 1.94 Luxembourg 2011 0 0 0.06 1.79 Cyprus 2012 0 0 -0.13 2.71 Luxembourg 2012 0 0 0.48 1.70 Czech Republic 2001 0 1 -0.99 4.08 Malta 2001 1 0 -3.05 1.85 Czech Republic 2002 0 1 -0.22 3.94 Malta 2002 0 0 -1.86 2.41 Czech Republic 2003 1 1 3.54 6.41 Malta 2003 0 0 -2.02 3.17 Czech Republic 2004 0 0 0.78 1.72 Malta 2004 0 0 -0.03 1.25 Czech Republic 2005 0 0 0.43 1.65 Malta 2005 0 0 0.91 2.99 Czech Republic 2006 0 0 0.91 1.75 Malta 2006 0 0 0.73 1.48 Czech Republic 2007 0 0 1.47 1.62 Malta 2007 0 0 0.11 1.08

(29)

26

Table 6, Continued

Czech Republic 2008 0 0 0.16 2.03 Malta 2008 0 1 -0.59 3.83 Czech Republic 2009 0 0 -1.17 2.61 Malta 2009 0 0 0.03 3.32 Czech Republic 2010 0 0 -0.76 2.59 Malta 2010 0 1 -0.19 3.70 Czech Republic 2011 0 0 -0.16 3.03 Malta 2011 0 0 0.10 2.87 Czech Republic 2012 0 0 -0.84 3.46 Malta 2012 0 0 1.51 1.31 Denmark 2001 0 0 0.13 0.88 Netherlands 2001 0 0 -0.27 2.38 Denmark 2002 0 0 -0.94 0.66 Netherlands 2002 0 0 -1.37 1.62 Denmark 2003 0 0 -0.56 0.98 Netherlands 2003 0 0 -1.67 0.71 Denmark 2004 0 0 1.00 0.17 Netherlands 2004 0 0 -1.27 0.97 Denmark 2005 1 0 2.59 1.00 Netherlands 2005 0 0 -0.95 0.76 Denmark 2006 0 0 1.35 1.66 Netherlands 2006 0 0 0.60 0.79 Denmark 2007 0 0 0.39 2.08 Netherlands 2007 0 0 0.13 0.89 Denmark 2008 0 0 -0.13 1.83 Netherlands 2008 0 0 1.53 1.01 Denmark 2009 0 0 -0.22 2.42 Netherlands 2009 0 0 -0.03 2.24 Denmark 2010 0 0 -0.46 2.42 Netherlands 2010 0 0 0.03 1.99 Denmark 2011 0 0 0.21 1.78 Netherlands 2011 0 0 0.08 1.71 Denmark 2012 0 0 -0.52 2.52 Netherlands 2012 0 0 1.11 2.23 Estonia 2001 0 0 -1.71 1.99 Poland 2001 0 0 0.65 2.90 Estonia 2002 0 0 -0.52 2.37 Poland 2002 0 0 0.65 2.36 Estonia 2003 0 0 0.65 1.59 Poland 2003 0 0 -0.26 3.08 Estonia 2004 0 0 -0.10 2.38 Poland 2004 0 0 -1.17 2.00 Estonia 2005 0 0 -1.72 1.67 Poland 2005 0 0 0.37 1.80 Estonia 2006 0 0 -1.03 3.27 Poland 2006 0 0 0.66 1.56 Estonia 2007 0 1 -0.87 5.66 Poland 2007 0 0 1.02 1.32 Estonia 2008 0 0 0.10 3.26 Poland 2008 0 0 -0.06 1.77 Estonia 2009 1 1 3.67 6.22 Poland 2009 0 0 -1.69 3.57 Estonia 2010 1 1 2.65 4.90 Poland 2010 0 1 -2.15 4.37 Estonia 2011 0 0 1.10 2.42 Poland 2011 0 1 -1.14 5.26 Estonia 2012 0 0 0.93 2.63 Poland 2012 0 0 0.06 2.82 Finland 2001 0 0 1.07 1.81 Portugal 2001 0 1 -1.03 3.71 Finland 2002 0 0 0.69 0.90 Portugal 2002 0 0 -0.33 2.65 Finland 2003 0 0 -0.35 1.70 Portugal 2003 0 0 0.25 1.78 Finland 2004 0 0 -0.47 1.03 Portugal 2004 0 0 1.25 1.52 Finland 2005 0 0 0.09 0.84 Portugal 2005 0 0 -1.55 1.25 Finland 2006 0 0 0.66 0.94 Portugal 2006 0 0 -1.05 2.00 Finland 2007 0 0 0.89 1.79 Portugal 2007 0 0 0.22 1.84 Finland 2008 0 0 1.76 1.83 Portugal 2008 0 0 0.58 1.27 Finland 2009 0 0 -0.11 2.85 Portugal 2009 0 0 -0.76 2.49 Finland 2010 0 0 -0.43 2.56 Portugal 2010 0 0 0.51 3.05 Finland 2011 0 0 -0.09 2.93 Portugal 2011 1 1 4.14 4.37

(30)

Table 6, Continued Finland 2012 0 0 -0.21 3.39 Portugal 2012 0 1 -0.22 7.04 France 2001 0 0 0.87 1.36 Romania 2001 0 1 -0.74 19.00 France 2002 0 0 -0.05 1.31 Romania 2002 0 1 0.22 8.58 France 2003 0 0 -0.56 0.72 Romania 2003 0 1 -1.42 8.67 France 2004 0 0 -0.49 0.41 Romania 2004 0 1 0.13 5.00 France 2005 0 0 0.40 0.43 Romania 2005 0 0 -0.88 3.22 France 2006 0 0 0.68 0.34 Romania 2006 0 0 -0.05 2.21 France 2007 0 0 0.04 0.70 Romania 2007 0 0 2.14 3.19 France 2008 0 0 0.18 0.80 Romania 2008 0 1 0.51 5.02 France 2009 0 0 -1.54 1.64 Romania 2009 0 1 0.05 6.82 France 2010 0 0 -0.98 1.72 Romania 2010 0 1 0.20 5.32 France 2011 0 0 -0.04 2.07 Romania 2011 0 1 -0.25 6.44 France 2012 0 0 0.93 2.62 Romania 2012 0 1 -0.25 5.87 Germany 2001 0 0 0.64 1.46 Slovakia 2001 1 1 2.75 3.75 Germany 2002 0 0 -0.01 1.34 Slovakia 2002 0 1 1.46 3.72 Germany 2003 0 0 -0.05 1.11 Slovakia 2003 0 0 2.26 2.59 Germany 2004 0 0 -0.71 0.72 Slovakia 2004 0 0 0.17 2.72 Germany 2005 0 0 -0.29 0.87 Slovakia 2005 0 0 -0.05 2.13 Germany 2006 0 0 0.18 1.30 Slovakia 2006 0 0 -0.52 1.35 Germany 2007 0 0 0.01 1.57 Slovakia 2007 0 0 -0.95 2.65 Germany 2008 0 0 0.25 1.51 Slovakia 2008 0 0 0.05 2.25 Germany 2009 0 0 0.02 1.20 Slovakia 2009 0 1 -0.17 3.88 Germany 2010 0 0 -1.18 1.81 Slovakia 2010 0 0 -1.21 2.33 Germany 2011 0 0 -0.12 2.47 Slovakia 2011 0 0 -0.28 1.16 Germany 2012 0 0 0.50 2.70 Slovakia 2012 0 0 0.33 2.42 Greece 2001 0 0 0.45 1.37 Slovenia 2001 0 1 -1.05 4.42 Greece 2002 0 0 0.36 0.94 Slovenia 2002 0 1 -0.67 3.86 Greece 2003 0 0 -0.04 1.48 Slovenia 2003 0 0 -0.93 2.51 Greece 2004 0 0 -1.01 1.63 Slovenia 2004 0 0 -0.55 1.13 Greece 2005 0 0 0.04 1.78 Slovenia 2005 0 0 0.03 1.86 Greece 2006 0 0 -0.34 0.89 Slovenia 2006 0 0 0.05 1.45 Greece 2007 0 0 0.63 0.76 Slovenia 2007 0 0 -0.03 3.26 Greece 2008 0 0 -0.04 2.38 Slovenia 2008 0 0 0.11 3.25 Greece 2009 0 0 -2.31 3.40 Slovenia 2009 0 0 -0.33 3.43 Greece 2010 0 1 -0.26 4.77 Slovenia 2010 0 1 0.81 3.69 Greece 2011 0 1 0.76 7.46 Slovenia 2011 0 0 0.64 2.41 Greece 2012 1 1 3.52 6.48 Slovenia 2012 0 0 1.29 3.38 Hungary 2001 0 1 -0.11 5.10 Spain 2001 0 0 0.12 0.91 Hungary 2002 0 1 -1.51 5.14 Spain 2002 0 0 0.53 0.57 Hungary 2003 0 0 -2.04 1.22 Spain 2003 0 0 -0.19 0.50

(31)

28 Table 6, Continued Hungary 2004 0 0 -2.07 0.93 Spain 2004 0 0 0.08 1.35 Hungary 2005 0 0 -1.93 2.02 Spain 2005 0 0 0.68 1.81 Hungary 2006 0 0 -1.22 2.78 Spain 2006 0 0 1.78 2.06 Hungary 2007 0 0 1.16 0.92 Spain 2007 1 0 3.20 2.12 Hungary 2008 0 0 1.62 1.74 Spain 2008 0 0 -0.08 3.24 Hungary 2009 0 0 1.64 2.00 Spain 2009 0 1 -2.01 5.10 Hungary 2010 0 0 0.11 2.82 Spain 2010 0 0 -2.12 3.47 Hungary 2011 1 0 10.30 2.62 Spain 2011 0 1 -2.19 3.97 Hungary 2012 0 1 0.19 3.91 Spain 2012 0 1 -2.10 7.02 Ireland 2001 0 1 0.56 3.73 Sweden 2001 0 0 2.36 1.55 Ireland 2002 0 0 -0.75 3.23 Sweden 2002 0 0 -0.11 1.03 Ireland 2003 0 0 -0.21 1.39 Sweden 2003 0 0 0.11 1.16 Ireland 2004 0 0 0.70 1.25 Sweden 2004 0 0 0.86 1.18 Ireland 2005 0 0 0.73 1.72 Sweden 2005 0 0 2.17 0.84 Ireland 2006 0 0 2.11 2.43 Sweden 2006 0 0 1.09 0.46 Ireland 2007 0 0 1.83 2.93 Sweden 2007 0 0 0.80 1.19 Ireland 2008 0 1 0.21 5.16 Sweden 2008 0 0 -1.66 2.14 Ireland 2009 0 1 -0.77 6.34 Sweden 2009 1 0 -2.68 3.03 Ireland 2010 1 1 -4.11 12.73 Sweden 2010 1 0 -3.01 2.01 Ireland 2011 0 1 -1.50 8.01 Sweden 2011 1 0 -4.85 2.88 Ireland 2012 0 1 -0.64 9.82 Sweden 2012 1 0 -4.70 2.82 United Kingdom 2001 0 0 1.39 1.86 United Kingdom 2002 0 0 -0.56 1.27 United Kingdom 2003 0 0 -1.12 0.95 United Kingdom 2004 0 0 -0.45 0.47 United Kingdom 2005 0 0 0.77 3.21 United Kingdom 2006 0 0 1.66 0.68 United Kingdom 2007 0 0 1.44 0.71 United Kingdom 2008 1 0 2.87 2.19 United Kingdom 2009 0 1 -0.55 3.80 United Kingdom 2010 0 1 -1.39 4.43 United Kingdom 2011 0 1 -1.17 4.93 United Kingdom 2012 0 1 0.45 5.46

(32)

Reference

Beaton, A.E., Tukey, J.W. (1974). The fitting of power series, meaning polynomials, illustrated on band-spectro-scopic data. Technometrics 16, 147-185.

Bramati, M., & Croux, C. (2003). Robust estimators for the fixed effects panel data model. DTEW Research Report 0336, 1-25.

Fischer, S., & Easterly, W. (1990). The economics of the government budget constraint. The World Bank Research Observer, 5(2), 127-142.

Fox, A. J. (1972). Outliers in time series. Journal of the Royal Statistical Society. Series B

(Methodological), 350-363.

Kurt, S., Gunes, C., & Davasligil, V. (2012). The Effect of Global Financial Crisis on Budget Deficits in European Countries: Panel Data Analysis.Istanbul University Econometrics

and Statistics e-Journal, 17(1), 1-22.

Lucas, A., Van Dijk, R., & Kloek, T. (1996). Outlier robust GMM estimation of leverage determinants in linear dynamic panel data models. Unpublished paper.

Mallows C.L. (1975), “On Some Topics in Robustness,” unpublished memorandum, Bell Tel.

Laboratories, Murray Hill.

Muirhead C. R. (1986) Distinguishing Outlier Types in Time Series, Journal of the Royal

Statistical Society. Series B (Methodological), Vol. 48, No. 1(1986), pp. 39-47

Nelson, R.M., Belkin, P., Mix, D.E. (2010). Greece Debt Crisis: Overview, Policy, Responses

and Implications. Retrieved 26th, Dec, from www.crs.gov

Wagenvoort, R., & Waldmann, R. (2002). On B-robust instrumental variable estimation of the linear model with panel data. Journal of Econometrics,106(2), 297-324.

(33)

30

Peña, D., & Prieto, F. J. (2001). Multivariate Outlier Detection and Robust Covariance Matrix Estimation. Technometrics, 286-300.

Rauch, B., Göttsche, M., Brähler, G., & Engel, S. (2011). Fact and Fiction in EU‐ Governmental Economic Data. German Economic Review, 12(3), 243-255.

Rousseeuw, P. J. (1984). Least median of squares regression. Journal of the American

statistical association, 79(388), 871-880.

Rousseeuw, P.J. and Leroy, A.M. (1987). Robust regression and outlier detection. New York,John Wiley.

Rousseeuw, P. J., & Driessen, K. V. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3), 212-223.

Rousseeuw, P. J., & Van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association,85(411), 633-639. Tsay, R. S. (1988). Outliers, level shifts, and variance changes in time series. Journal of

forecasting, 7(1), 1-20.

Tsay, R. S., Peña, D., & Pankratz, A. E. (2000). Outliers in multivariate time series. Biometrika, 87(4), 789-804.

References

Related documents

Therefore, in this thesis we propose a framework for detection and tracking of unknown objects using sparse VLP-16 LiDAR data which is mounted on a heavy duty vehicle..

From a technical perspective a hydrological effect study is usually done by running information on future rain and temperature a large set of climate predictions through a series

Anomaly detection has been existing for several decades but it was only until the last decade that it has been evolving with a fast pace. That is mostly due to the fact that

(2013a) The effect of improved compliance with hygiene guidelines on transmission of Staphylococcus aureus to newborn infants: the Swedish Hygiene Intervention and Transmission of

• For non-parametric supervised anomaly detection, the Hausdorff distance based RBF kernel performs better on discriminating between department groups than the Fr´ echet distance

Each found event i.e. a time interval can then be projected to whatever interesting time series that is available. Projection here means the corresponding interval of another timed

The main purpose of this work is to test if an ensemble of Long Short-Term Memory Recurrent Neural Networks (LSTM), configured as autoencoders, can be used to detect outliers in

[7] developed a method for outlier detection using unsuper- vised machine learning by dividing the data feature space into clusters and used distance measures within each cluster