• No results found

Regional Rainfall Frequency Analysis

N/A
N/A
Protected

Academic year: 2021

Share "Regional Rainfall Frequency Analysis"

Copied!
46
0
0

Loading.... (view fulltext now)

Full text

(1)

Regional Rainfall Frequency

Analysis Using Eleven Sites of

Sk˚

ane, Sweden

Regional Regnfrekvensanalys av

Elva Platser i Sk˚

ane, Sverige

Daniel Bezaatpour and Olov Rudberg

Department of Statistics

Sweden

(2)

Abstract

Frequency analysis is a vital tool when finding a well-suited probability distribution in order to predict extreme rainfall. The regional frequency approach have been used for determination of homogeneous regions, using 11 sites in Sk˚ane, Sweden. To describe maximum annual daily rainfall, the Generalized Logistic (GLO), General-ized Extreme Value (GEV), GeneralGeneral-ized Normal (GNO), Pearson Type III (PE3), and Generalized Pareto (GPA) distributions have been considered. The method of L-moments have been used in order to find parameter estimates for the candidate distributions. Heterogeneity measures, goodness-of-fit tests, and accuracy measures have been executed in order to accurately estimate quantiles for 1-, 5-, 10-, 50- and 100-year return periods. It was found that the whole province of Sk˚ane could be considered as homogeneous. The GEV distribution was the most consistent with the data followed by the GNO distribution and they were both used in order to estimate quantiles for the return periods. The GEV distribution generated the most precise estimates with the lowest relative RMSE, hence, it was concluded to be the best-fit distribution for maximum annual daily rainfall in the province.

(3)

Dedication

This thesis is dedicated to our families and close friends who provided us with unfailing support throughout our years of study and through the process of writing this thesis. This accomplishment would have not been possible without them.

(4)

Acknowledgements

We would like to express our profound gratitude to our supervisor Mahmood Ul Hassan, Ph.D, for providing well-needed guidance. His dedication and knowledge have been very appreciated and contributed much to the completion of this thesis.

We would also like to thank the Department of Statistics at Stockholm University for providing us with the resources needed for the completion of our thesis.

(5)

Contents

1 Introduction 1

2 Description of Study Area and Data Set 3

2.1 Objectives of Study . . . 3 2.2 Data Set . . . 3 2.3 Disposition . . . 5 2.4 Limitations . . . 5 3 Theory Review 6 3.1 Frequency Analysis . . . 6 3.2 Previous Research . . . 6

3.3 Probability Distribution Functions . . . 9

3.4 L-moments . . . 11

4 Methodology 12 4.1 Screening of Data . . . 12

4.1.1 Discordance Measure . . . 12

4.2 Identification of Homogeneous Regions . . . 13

4.2.1 Heterogeneity Measure . . . 13

4.2.2 Cluster Analysis . . . 14

4.3 Choice of Frequency Distribution . . . 15

4.3.1 Goodness-of-Fit Measure . . . 16

4.4 Estimation of Frequency Distribution . . . 16

4.4.1 Regional L-moment Algorithm . . . 16

4.4.2 Accuracy Measures . . . 17

5 Results and Discussion 18 5.1 Discordant Measures . . . 18

5.2 Heterogeneity Measures . . . 20

5.3 Goodness-of-fit Measures . . . 21

5.4 Quantile Estimates and Accuracy Measures . . . 24

6 Conclusion 28

References 29

A Summary of Observed Data and Formulas 31

(6)

List of Figures

2.1 Geographical location of the province and the 11 gauged sites. . . 4 5.1 Growth curves of the GLO, GEV, GPA, GNO, and PE3 distributions. 23 A.1 Frequency histogram of maximum annual daily rainfall measured in

mm. . . 32 B.1 L-CV plotted against L-skewness for the 11 sites. . . 34 B.2 L-kurtosis plotted against L-skewness for the 11 sites. . . 35 B.3 Regional growth curve with 95% error bounds for the GEV distribution. 36 B.4 Regional growth curve with 95% error bounds for the GNO distribution. 37

(7)

List of Tables

2.1 At-site summary statistics for the 11 Sites. . . 5 5.1 At-site descriptive results together with L-moments and discordance

of the 11 sites. . . 19 5.2 Descriptive statistics of L-CV together with the H1-statistic. . . 20

5.3 Goodness-of-fit tests using 1000 Monte Carlo simulations. ˆµ denotes the location, ˆα denotes the scale, and ˆk denotes the shape parameter. 21 5.4 At-site quantile estimates, ˆQi(F ) = l

(i)

1 q(F ), with 95% confidenceˆ

bounds and relative RMSE of the return periods based on the GEV distribution measured in mm. . . 27 A.1 Regular quantiles of precipitation for the 11 sites derived from

ob-served data and measured in mm. . . 31 A.2 Probability density functions and the corresponding quantile functions. 33 B.1 Descriptive statistics of L-CV and L-skewness together with the H2

-statistic. . . 36 B.2 Descriptive statistics of L-skewness and L-kurtosis together with the

H3-statistic. . . 36

B.3 At-site quantile estimates, ˆQi(F ) = l (i)

1 q(F ), with 95% confidenceˆ

bounds and relative RMSE of the return periods based on the GNO distribution measured in mm. . . 38 B.4 Regional relative RMSE for the different return periods measured in

(8)

Chapter 1

Introduction

Extreme rainfall is one of the major reasons for natural disasters and causes damages throughout the world. Consequences like dam breakage and damages to both agricul-tural landscape and infrastructure may emerge from this naagricul-tural hazard. For various hydrological studies, maximum annual daily rainfall together with the frequency of rainfall for a given region or basin is often required. The estimation of rainfall mag-nitudes is carried out by frequency analysis for particular recurrence intervals and is extensively used in fields like city planning, water resources management, highway building, and railway culverts (Griffiths & Clausen, 1997). Hence, the estimation of a well-fitting distribution for time dependent rainfall has been a topic of interest in meteorology and hydrology for a long time. Numerous measurement techniques and probability distribution functions have been utilized throughout history in order to calculate the probability of rainfall for a given return period.

We intend to study the province of Sk˚ane which is situated in the most southern part of Sweden. It consists of a catchment area of 11303 km2 with large cities such

as Malm¨o, Lund, and Helsingborg and is home to about 13.3% of the total pop-ulation of Sweden as of 2019 (SCB, 2020). Sk˚ane borders the provinces Blekinge, Sm˚aland, and Halland and is considered to be the warmest province in Sweden. It is characterised by its flat landscape and large agricultural fields, making it a home for a large proportion of Sweden’s farmers (Jordbruksverket, 2019). Rainfall is a crucial part of the agriculture which means that the province is very dependent on this hydrological event and very sensitive to natural disasters like floods, but also droughts and inconsistent rain occurrences. During the summer of 2014 Sk˚ane witnessed one of its worst floodings ever recorded. Above 100 mm of precipitation paralyzed the southwestern parts leaving flooded highways, power failure, and dam-ages to property and railways (SMHI, 2015). The usage of one-day data is therefore crucial since extreme precipitation gathered during 24 hours is enough to create flash floods. However, the magnitude of rain in Sweden is usually not extreme enough to make sub-day data necessary to measure, and is why we limit the study to a daily recurrence interval.

Almost all parts of the province have gauged sites where precipitation have been collected, some sites starting at the end of the 18th century and others from mid-19th century, leaving very few areas ungauged. According to Hosking and Wallis (1997), two-parameter probability distributions produce heavily biased quantile estimates in the extreme upper tail of the distribution. Therefore, Hosking and Wallis (1997) stated that three-parameter probability distributions are more appropriate for most

(9)

applications of regional frequency analysis. In this study, we consider the most widely used three-parameter probability distributions such as Generalized Pareto (GPA), Generalized Normal (GNO), Pearson Type-III (PE3), Generalized Logistic (GLO), and Generalized Extreme Value (GEV) distributions to identify the best-fit distribution for the 11 gauging sites in Sk˚ane.

(10)

Chapter 2

Description of Study Area and

Data Set

2.1

Objectives of Study

The objectives of the study are as follows:

1. To form homogeneous regions (groups of sites) based on maximum annual daily rainfall data series of 11 gauging sites in Sk˚ane.

2. To find the best-fit regional frequency distribution for estimating rainfall quan-tiles for different return periods of the region.

3. To estimate the magnitude of rainfall at each site using the best-fit regional frequency distribution for different return periods.

4. Finally, to use Monte Carlo simulations to provide the relative root mean square error and 95% confidence intervals for regional and at-site quantile estimates to asses the accuracy of the estimates.

2.2

Data Set

The data is collected from the Swedish Meteorological and Hydrological Institute (SMHI). SMHI is a governmental agency in Sweden that operates under the Ministry of Environment which holds expertise within the area of meteorology, hydrology and oceanography, making them a highly suitable source of data for our study.

Maximum annual daily rainfall data was used, meaning that the highest mea-sured precipitation for an individual year was extracted. The data was collected from 11 different gauging sites in the province of Sk˚ane where the starting period of measurement of the sites ranges between the years 1863 and 1961. These sites cover the whole province of Sk˚ane, from coastal areas to the mainland. The left figure in figure 2.1 illustrates the location of Sk˚ane in Sweden and the right figure illustrates where the different gauging sites are located within the province. Fur-thermore, table 2.1 presents the record length for each individual site together with the corresponding altitude, latitude, longitude, and average temperature.

(11)

ss

(12)

Summary Statistics of Sites Site Name Record

Length

Altitude (m)

Latitude Longitude Avg. Temp. (Celsius) Brom¨olla 1961-2019 15 56.0783 14.4749 7.91 Klippan 1945-2019 20 56.1207 13.4552 7.29 Kristianstad 1879-2019 10 56.3300 14.1500 8.06 Landskrona 1945-2019 5 55.8760 12.8438 9.06 Lund 1863-2019 50 55.7089 13.2026 8.17 Malm¨o 1927-2019 5 55.6093 13.0817 8.20 Osby 1954-2019 85 56.3910 13.9843 6.56 Tomelilla 1959-2019 67 55.5533 13.9450 8.48 Trelleborg 1945-2019 5 55.3810 13.1279 8.24 Vomb 1945-2019 25 55.6616 13.5323 7.48 Ystad 1949-2019 32 55.4410 13.8278 7.45

Table 2.1: At-site summary statistics for the 11 Sites.

2.3

Disposition

This thesis is structured in the following order of chapters:

Chapter 1: Introduction - This chapter gives an introduction to the research and study area.

Chapter 2: Description of study area and data set - This chapter covers the objec-tives of the study and the limitations together with a presentation of the data set. Chapter 3: Theory Review - This chapter reviews previous research regarding the subject of the study along with the implemented theory.

Chapter 4: Methodology - This chapter covers the data processing analysis meth-ods.

Chapter 5: Results and discussion - In this chapter the results are presented and further discussed.

Chapter 6: Conclusions - This chapter summarizes the findings of the study.

2.4

Limitations

This study analyses rainfall data for 11 sites across the province of Sk˚ane. All sites do not have consecutive monitoring and the sites’ gauging differs somewhat in their staring period. Some stations were installed in the late 18th century while others were installed in the middle of the 19th century. The foremost limitation of this study is the lack of data continuity along with different record lengths of the sites. Moreover, the measuring instruments used in the early years for data collection will differ in technology and terms of precision from those used in the late 70’s and onward and it might bring forth some measurement discrepancies for the years of study. However, for the convenience of this study we will assume equivalent accuracy of the data collected.

(13)

Chapter 3

Theory Review

3.1

Frequency Analysis

Frequency analysis is the estimation of how often a specified event will occur. In frequency analysis, the probability distribution is used to relate the vastness of ex-treme events to their frequency of occurrence (Hosking & Wallis,1997). Estimation of the frequency of extreme events is often of great importance. In this case, the es-timation of rainfall patterns and quantities in a certain area is important knowledge for water resources management. Since extreme rainfall is one of the main reasons for floodings, anticipation of extreme rainfall is vital for better flood management (Alam et al., 2018).

The application of the regional approach in frequency analysis has a few benefits. According to Hosking and Wallis (1997), the regional frequency analysis allows three-parameter distributions to be more reliably estimated than when only using single-site data. And by more precise estimation of distributions, more accurate quantiles for return periods can be estimated. Another advantage of regionalization is the robustness against data discrepancies. Lucas et al. (2012) among others found that a region as a whole is less affected when the sites being measured contain different record lengths or have ungauged gaps in the observations.

3.2

Previous Research

This study makes use of multiple papers regarding frequency analysis for determining the best-fit probability distributions in the case of maximum rainfall using different statistical analyzes and distribution types. However, since our planets prerequisites are not the same globally, it is only natural that different studies end at different conclusions.

Fowler and Kilsby (2003) introduced a study on regional frequency analysis of extreme rainfall between 1961 and 2000 in order to find the best-fit probability distribution. They examined the variability of extreme rainfall quantities from 204 sites across the United Kingdom. The data was divided into 1-, 2-, 5-, and 10-day intervals to get a more precise result when growth curves based on the Generalized Extreme Value distribution (GEV) were estimated. To examine if the sites exhibited homogeneity in their distribution, they used a discordance measure based on L-moments introduced by Hosking and Wallis (1997). moment ratios, CV, L-skewness, and L-kurtosis were derived from the specific sites to later be compared

(14)

with the regional average, weighted according to record length.

Nine homogeneous regions where found. The northern parts of UK was more prone to extreme rainfall changes over the years, especially for the 5- and 10-day intervals which led to a higher variability and L-CV. In Scotland, both the amount of rainfall and duration of the events had increased, making it one of the regions with the highest growth curves, leading to a more unpredictable estimation of rain occurrence and a harder-to-handle maintenance of infrastructure and water resources management. The of amount of rainfall measured in mm had increased by over 70% in the eastern part of Scotland while the southern part of England had decreased by approximately 20% within the last nine years. Finally, it was concluded that the Generalized Extreme Value distribution was the most suitable distribution for northern parts of the United Kingdom.

Two other researchers studying the variability of rainfall frequency was Parida and Moalafhi (2008). They identified the climate variability using 11 sites across Botswana. Their goal was to examine if it was possible to identify the whole coun-try as one region and hypothesized that the rainfall frequency followed one single “parent” distribution. They conducted an intervention analysis on time series data which is a part of causal inference where the intervention “break-off” was situated between 1981 and 1982. By dividing their data into two groups, a split T-test with pooled variance was used in order to get the series into a common ground. The L-moment technique was applied for regional data which did not only help with bias problematics in small samples, but also to identify the underlying statistical model by a goodness-of-fit test. The bias problematics of small sample sizes has also been studied by Arnaud et al. (2002). The estimation of a well-suited probability dis-tribution and the Hosking-Wallis heterogeneity test were both conducted through Monte Carlo simulations.

Their study showed that between 1961 and 1981, the average rainfall had an increasing rate but only to abate towards the end of the study in 2003. All of the sites in the regional analysis showed homogenic behavior in frequency, explaining that the whole region followed a single probability distribution, in this case the Generalized Extreme Value distribution.

A similar paper was introduced by Hassan and Ping (2012) where they faced problematics of heterogeneous distributions across various regions for the Luanhe basin in northern China. Extreme annual rainfall data between 1932 and 1970 was analyzed and the selection of sites was made based on if a minimum of seven years of historical rainfall data were available for the sites. Furthermore, the average record length of the sites was 20 years, ranging between 12 and 37 years. After an initial screening of the data, 17 gauged sites were selected.

The discordance measures were generated by first applying L-moments to five three-parameter distributions, the Generalized Logistic, Generalized Extreme Value, Generalized Normal, Pearson type III, and Generalized Pareto distributions. They found that none of the four regions had a discordant site. However, when performing the Hosking-Wallis heterogeneity test on the four regions combined, it was concluded that no main distribution could be identified. Further, when performing cluster analysis, they could easily determine new homogeneous regions. Seven new regions were found by applying Ward’s method. The same course of action was done by Gottschalk (1985) when studying hydrologic regionalization in Sweden.

(15)

it was found that one site within the cluster was discordant due to one extreme rainfall event. However, it was concluded that this event had the same probability of happening at any of the sites and was not prone to one certain site which by theory provided by Hosking and Wallis (1997) is not considered as an anomaly. In their conclusion, the Generalized Pareto distribution could be applied to most of the cluster regions except for the central parts of the Luanhe basin where the Generalized Extreme Value distribution was more applicable. Moreover, no overall distribution could be determined for the entire basin.

In a later paper, Hussain (2017) estimated flood quantiles using regional fre-quency analysis. He conducted a time series data on multiple site-characteristics in Punjab, Pakistan. The data series consisted of 30 years (1961-1990) from 11 gauged sites near the four major rivers in the province and was collected using an-nual maximum peak flows. The method of L-moments was utilized in order to use a discordance measure together with a heterogeneity measure, proposed by Hosking and Wallis (1997), on each site. He found two homogeneous regions where nine sites was included in the first, and two river-based sites were included in the second. The first region could be affiliated with the Generalized Normal distribution while the Pearson Type III distribution was the best-fit in the second region. Knowing the regional probability distribution, Hussain was able to accurately predict quantiles of annual maximum peak flows at ungauged sites which further could be used for water resources management and infrastructure planning.

A paper also aimed to study the estimation of quantiles based on frequency analysis was introduced by Alam et al. (2018). The purpose of their study was to determine the best-fit probability distribution and to extract expected maxi-mum monthly rainfall for the return periods of 10-, 25-, 50-, and 100-years. They used maximum monthly rainfall collected between the years 1984 and 2013 from 35 locations throughout Bangladesh. Both method of moments and the method of L-moments were applied when estimating the parameters of the sites’ distributions. To be able to determine a candidate distribution the authors used both the boot-strap Anderson-Darling and the Kolmogorov-Smirnov goodness-of-fit tests together with the visual representation of a Q-Q plot. The candidate distributions were the Normal, Log-Normal (LN), Pearson Type III (PE3), Log-Pearson Type III (LP3), Exponential, Gumble (GUM), Generalized Extreme Value (GEV), Weibull, and the Generalized Pareto distributions (GPA).

They concluded that the precipitation varied greatly due to site-characteristics such as elevation and location. According to the goodness-of-fit tests, the GEV distribution could be fitted to approximately 36% of the sites while the PE3 and LP3 distributions each fitted 26% of the sites. The presentation of these three distributions gave a clear view of the range of projected return periods. The 10-, 25-, 50-, and 100-year return periods of maximum monthly rainfall were successfully forecasted for all locations.

In another study were three different probability distributions were found to be the best-fit, Lee (2005) scrutinized the rainfall distribution characteristics in southern Taiwan. Annual rainfall data from a 10-year period was collected from 178 gauging sites to perform frequency analysis. Five frequency distributions were applied and it was found that the best-fit distribution in the given area was the Log-Pearson Type III distribution, followed by the Log-Normal and Pearson Type III distributions.

(16)

In a later study, Malekinezhad and Zare-Garizi (2014) investigated regional fre-quency analysis of maximum daily rainfalls in northeastern Iran. The aim of the study was to find adequate regional frequency distributions for maximum daily rain-falls and predict the return values of extreme rainfall events. Maximum rainfall records were collected from 47 gauging sites across the study area. Thereafter, L-moment regionalization procedures together with an index rainfall method were applied to the maximum rainfall data. After a cluster analysis of the site charac-teristics and tests for regional homogeneity, the study area was divided into five homogeneous regions. However, the goodness-of-fit results indicated that every in-dividual homogeneous region had its own best-fit distribution. Distinctive climatic and geographic conditions were discussed as possible explanations of this. Overall, cluster analysis together with the L-moment based regional frequency analysis tech-nique were found to be successful in predicting rainfall estimates in northeastern Iran.

3.3

Probability Distribution Functions

In the present study we have focused on five different probability distribution func-tions when determining the best-fit distribution for the region of study. They consist of i) Three-Parameter Generalized Logistic distribution, ii) Generalized Extreme Value distribution, iii) Two-Parameter Generalized Normal distribution, iv) Pear-son Type III distribution, and v) Generalized Pareto distribution. Furthermore, we will apply the four-parameter Kappa distribution when simulations are made in the goodness-of-fit tests and heterogeneity measures. Table A.2 presents the individual formulas for the probability density functions together with the quantile functions utilized in section 5.4.

Generalized Logistic Distribution

The Generalized Logistic distribution (GLO) is included in a three-parameter family of continuous probability distributions. It consists of a location, scale, and shape parameter and is a reparameterized version of the Log-Logistic distribution. It is an extended version of the two-parameter Logistic distribution in the case where the shape parameter is set to zero (Johnson et al., 1994). It has been used extensively for hydrological measurement, especially for maximum rainfall modeling and has good characteristics for fitting extreme values.

Generalized Extreme Value Distribution

The Generalized Extreme Value distribution (GEV) consists of location, scale, and shape parameters. It is based on the extreme value theory that provides the statis-tical guidance to make inference about the probability of extreme events and by this theorem, is the only possible limit distribution to manage the normalized maximum of a sequence of independent and identically distributed random variables. It is therefore often used as an approximation of the maximum of random variables. The distribution unites the Gumble, Fr´echet, and Weibull families also known as Type I, Type II, and Type III Extreme Value distributions when the shape parameter is equal to zero, greater than zero, or lower than zero respectively (Johnson et al.,

1994). It is the standard probability distribution for the UK in hydrological risk analysis and is widely used for rainfall and flood data (Fowler & Kilsby, 2003).

(17)

Generalized Normal Distribution

The Generalized Normal distribution (GNO), also called the Generalized Gaussian distribution, is a three-parameter probability distribution. The location parameter is the arithmetic mean. It will reduce to the usual Normal distribution when the scale parameter is equal to zero. The shape parameter introduces the skewness of the distribution. If the value of the shape parameter is greater than zero it will result in left skewness and if the shape parameter is less than zero, right skewness will emerge. It is only when this parameter is set equal to one that the Generalized Normal distribution will reduce to the Laplace distribution. The GNO distribution is mostly applicable when symmetry is coveted since it takes skewness into account. This distribution is also a well-suited distribution when the data is believed to be normally distributed (Varanasi & Aazhang,1989).

Pearson Type III Distribution

The Pearson Type III distribution (PE3) is one of the seven different types of Pear-son probability distributions introduced by PearPear-son (1985) mainly to be used in biostatistics, more specifically in the survival analysis field. This specific distribu-tion is also known as the three-parameter Generalized Gamma distribudistribu-tion, which is a parent model of the regular Gamma distribution, and is prone to model skewed or asymmetric data. There are two special cases of the Pearson Type III distribution. The first case emerges when its shape parameter takes on the value zero and reduces to a Normal distribution with all its properties. The second case is when the shape parameter is set equal to two and by then reduces to the Exponential distribution. It is an extensively used distribution for hydrological modeling and prediction since extreme weather events like heavy rainfall and flood flow usually are not symmetric around their mean together with the fact that using the logarithm makes it consid-erably more simple when deriving the underlying variable (Singh & Singh,1985). Generalized Pareto Distribution

The Generalized Pareto distribution (GPA) is a three-parameter probability distri-bution and is specified by the location, shape, and scale parameters. If both the shape and scale parameters are reduced to zero, the distribution reduces to the Ex-ponential distribution. Another special case of the Generalized Pareto distribution arises when the shape parameter is greater than zero and the location parameter is equal to its scale parameter divided by its shape parameter. It then reduces to the two-parameter Pareto distribution. It is frequently used in modeling tails of other distributions, especially cases where the variability of a certain arbitrary distribution is large and cannot be estimated properly. The Generalized Pareto dis-tribution could be applicable as a separate model in order to more precisely estimate a certain threshold of the tail data. This is also mentioned as exceedance (Arnold,

2011). Since the distribution is well-suited for modeling tails of a distribution it is very often used for frequency analysis of extreme hydrological events such as annual maximum rainfall and river discharges.

(18)

3.4

L-moments

Hosking (1990) introduced the L-moments as linear functions of probability weighted moments (PMW’s). They can be defined for any random variable whose mean exists and are calculated as expectations of certain linear combinations of order statistics. The j th-order PMW (βj) is defined as

βr =R01y(F )F (y)rdF r = 0, 1, 2, ...

where y(F) is a quantile function of distribution and F(y) is a cumulative probability distribution. The first four L-moments in terms of linear combination of PWM are defined as

λ1 = β0

λ2 = 2β1− β0

λ3 = 6β2− 6β1+ β0

λ4 = 20β3− 30β2+ 12β1− β0

where the first L-moment (λ1) is the measure of location (mean), while the second

L-moment (λ2) is a measure of dispersion. Further, the L-moment ratios are defined

below

L − Coef f icient of V ariation (τ2) = λλ21

L − Skewness (τ3) = λλ32

L − Kurtosis (τ4) = λλ4

2

The unbiased sample estimators of the first four PWM’s for any distribution are given below b0 = n−1Pnj=1yj:n b1 = n−1Pnj=2 (j−1) (n−1)yj:n b2 = n−1Pnj=3 (j−1)(j−2) (n−1)(n−2)yj:n b3 = n−1Pnj=4 (j−1)(j−2)(j−3) (n−1)(n−2)(n−3)yj:n

where the data (yj:n) are given in ascending order from 1 to n. Furthermore, by

equating the sample L-moments with distribution L-moments, the estimated pa-rameters are acquired.

Finally, since L-moments are linear functions of the data, they are more robust against the effects of sampling variability compared to conventional moments. They are able to distinguish between a wider range of distributions and since L-moments are bounded |τr| < 1, they often produce more comprehensive results compared to

regular moments. Furthermore, L-moments suffer less against the presence of out-liers in the data than conventional moments which enables more secure inference to be made from small samples about an underlying probability distribution. Hosking (1990) also stated that L-moments sometimes yield more efficient parameter esti-mates than the method of maximum likelihood.

(19)

Chapter 4

Methodology

Regional frequency analysis is applied to meet the objectives of this study. Hosking and Wallis (1997) presented four steps in regional frequency analysis which are described in detail below. The four steps followed in this approach are i) screening of data, ii) identification of homogeneous regions, iii) choice of frequency distribution, and iv) estimation of frequency distribution.

4.1

Screening of Data

After being collected, the data is screened to check whether it is appropriate for regional frequency analysis. The requirements aimed to be satisfied is that the data collected at a site must be a true description of the quantity being measured and must all be drawn from the same frequency distribution. Hosking and Wallis (1997) found that useful information could be acquired by comparing the sample L-moment ratios for different sites. They found that the sample L-moments could reflect incorrect data values, shifts, trends, and outliers in the mean of a sample. A combination of the L-moment ratios was then created as a single statistic that measured the discordance between the L-moment ratios of a site and the average L-moment ratios of a group of similar sites. The discordance measure is described below.

4.1.1

Discordance Measure

The aim of the discordance measure is to identify those sites that are sufficiently discordant with the group as a whole. Sites with gross errors in their data will ob-trude from the other sites and be flagged as discordant. The sites which are flagged should then be closely investigated for errors and uncertainties in the data. Further-more, discordance is measured in terms of the L-moments of the sites’ data. The discordance formula for site i is defined below

Di = 13N (ui− ¯u)TS−1(ui− ¯u)

where ¯u is the unweighted group average and S is the matrix of sum of squares and cross-products. Hosking and Wallis (1997) stated that sites with Di > 3 should be

regarded as discordant since they have L-moment ratios that are markedly different from the average for the other sites in the region.

(20)

4.2

Identification of Homogeneous Regions

The next step in regional frequency analysis is to identify homogeneous regions. The aim is to form groups of sites that approximately possess the same frequency distribution and whose location and shape parameters are, with some sampling variability, identical besides from a site-specific scale factor (Hosking & Wallis,1993). This is achieved by assigning the sites into disjoint groups. Once a set of physically plausible regions has been defined, it is important to assess whether the regions could be considered as meaningful. This involves testing whether the proposed region can be considered as being homogeneous and whether more homogeneous regions are sufficiently resembling that they could be combined into a single region.

4.2.1

Heterogeneity Measure

The heterogeneity measure estimates the degree of heterogeneity in a group of sites to assess whether the sites might reasonably be treated as a homogeneous region. It compares the between-site variations in sample L-moments for the group of sites with what would be expected for a homogeneous region. By plotting graphs of L-skewness versus L-CV (L-coefficient of variation) and L-kurtosis individually, a visual assessment of the dispersion of the at-site L-moment ratios is obtained. An-other alternative measure of the dispersion is the standard deviation of the at-site L-CVs, weighted proportionally to record lengths. The mean and standard devia-tion of the chosen dispersion measure are generated through repeated simuladevia-tions of a homogeneous region with the same record length of the sites as in the observed data. A statistic to compare the observed and simulated dispersions is presented below.

(observed dispersion) – (mean of simulations) standard deviation of simulations

A large positive value of this statistic, usually denoted as Hi, indicates a greater

dis-persion in the observed L-moment ratios than what is consistent with the hypothesis of homogeneity. The formula for the observed dispersion of the at-site L-CV is given by V1 =  PN i=1ni  t(i)− tR2/PN i=1ni 1/2

while the other measures of between-site variability based on CV against L-skewness and L-L-skewness against L-kurtosis are respectively given by

V2 =PNi=1ni n (t(i)− tR)2+ (t(i) 3 − tR3)2 o1/2 /PN i=1ni, V3 =PNi=1ni n (t(i)3 − tR 3)2+ (t (i) 4 − tR4)2 o1/2 /PN i=1ni.

(21)

When the values of V1, V2 and V3 are retained, the aforementioned test-statistic

Hi is calculated using the formula

Hi = Vi−µσvV

at which a region can be declared “acceptably homogeneous” when Hi < 1,

“possi-bly heterogeneous” if 1 ≤ Hi ≤ 2 and finally, “definitely heterogeneous” if Hi ≥ 2

which further suggests subdivision of the region in order for it to improve the ac-curacy of the quantile estimates. The values of µv and σv are calculated using

1000 Monte Carlo simulations, assuming a four-parameter Kappa distribution as the parent model.

As to why the Kappa distributions is used for the simulations of the region is because it includes the special cases of the Generalized Logistic, Generalized Extreme Value, and Generalized Pareto distributions which are very affluent of representing data in environmental sciences and modeling extreme values (Hosking & Wallis,

1997). Moreover, a four-parameter distribution tend to give less biased parameter estimates for three-parameter distributions since it better mimics the shapes of their curves. The extra parameter compensates for any latent bias, however, using parent distributions containing more than two parameters compared to the distributions estimated leads to overparametrization and overfitting. The bias generated from an overfitted model is normally low, however, variance is prone to be high making predictions less accurate (Good & Hardin,2012).

For both real data and simulated regions, the H-statistic based on V2 and V3 lack

power to distinguish between homogeneous and heterogeneous regions. They rarely yield values of the statistic above 2, even for grossly heterogeneous regions. Hence, the H-statistic based on V1 has stronger discriminatory power and is more reliable

(Hosking & Wallis,1997).

4.2.2

Cluster Analysis

Cluster analysis is a data reduction technique mainly used in multivariate analysis for dividing data into clusters. It is extensively used in frequency analysis when identifying homogeneous regions. A vector of data is associated with a specific site and is then partitioned according to the similarities based on the aforementioned at-site statistics. According to Hosking and Wallis (1997), most cluster algorithms used in frequency analysis measure the similarities of the at-site statistics by Eu-clidean distances. The simple mathematical explanation of the EuEu-clidean distance is the straight-line distance projected in a metric space. However, it is also applicable for the characteristics and measured values of a specific site. The distance is defined as

Dij =

q PP

K=1(xik− xjk)2

where Dij is the absolute distance between subjects i and j, xik is the value of the

kth variable for the ith subject. Moreover, xjk is the value of the kth variable for the

jth subject, and p is the number of variables. When measured, the points distances, i.e the value of Dij, are calculated using a similarity matrix. When the value of

Dij is small, the two points being measured share similar characteristics. (Sharma,

(22)

Hierarchical clustering is a method of grouping data. By combining points or sites that have the most similar distances, the within-cluster variance is minimized and the between-cluster variance is maximized. This, compared to the non-hierarchical cluster technique is preferred since it does not require a priori knowledge of the num-ber of clusters or partitions. However, the advantage with non-hierarchical clustering is the possibility to re-cluster the sites if needed. Sharma (1996) claimed that these two approaches should be viewed as complementarities rather than competitors. There are five methods used in hierarchical clustering which consists of the Cen-troid, Single-Linkage, Average-Linkage, Complete-Linkage, and Ward’s methods. The Ward’s method of hierarchical clustering have been used by Guttman (1993) while Hassan and Ping (2012) used both the Ward’s and Complete-Linkage meth-ods. For determining the numbers of clusters and where the cut-off point should be, Sharma (1996) stated that using R-squared (R2) and root mean square standard deviation (RMSSTD) will lead to comprehensive results. Furthermore, there are no precise assumptions that the clusters will satisfy the homogeneity condition. It is merely a help for regionalizing. Hosking and Wallis (1997) concluded that there are no correct numbers of clusters and no perfect way of grouping sites since their site statistics might be similar, however, not their frequency distributions.

4.3

Choice of Frequency Distribution

The aim when choosing a regional frequency distribution is to find one that provides accurate quantile estimates for each site. In general, the intended region will in practice be slightly heterogeneous and a “true” overall distribution that applies for all the sites is hard to find. Furthermore, in order to yield reasonably accurate quantile estimates even though the true at-site frequency distribution deviates from the fitted regional distribution it is recommended to use a robust approach. It depends on how well the upper and lower bounds of the distribution for the site are modeled. Hosking and Wallis (1997) argues that a set of candidate distributions that are capable of modeling the bounds of the true data should be applied. If a candidate distribution who possesses no upper or lower bound is applied on a site whose distribution is bounded, the parameter estimation will yield incorrect results. There are sometimes theoretical reasons why certain frequency distributions are applicable for a given type of data. Gumbel (1954) reasoned that Extreme Value distributions may be well-suited for extreme events such as maximum rainfall since the use of distributions with more parameters tend to yield less bias of the esti-mated quantiles. Hosking and Wallis (1997) also concluded that using distributions with three or more parameters as well-fitted candidates can be more reliable when estimating extreme values.

(23)

4.3.1

Goodness-of-Fit Measure

To find the best-fit distribution for a region, a goodness-of-fit test is in order. An im-portant aspect of the procedure is to find a parent distribution that contains more parameters that the candidate distribution being tested when using Monte Carlo simulations. This is in order to not underfit the estimates since underfitted models fail to capture the underlying trend of the data. The goodness-of-fit test proposed by Hosking and Wallis (1997) will judge how well the L-kurtosis and L-skewness of the fitted distributions match the regional average L-kurtosis and L-skewness of the observed data. The assessment of this difference is made by comparing sampling variability in the regional average L-kurtosis known as ¯t4. The standard deviation

of ¯t4 is denoted as σ4 and is obtained through Monte Carlo simulations of a

ho-mogeneous region using a four-parameter Kappa distribution. The goodness-of-fit measure is given by

ZDIST = (τ4DIST − tR

4 + β4)/σ4

where τDIST

4 represents the fitted L-kurtosis from the candidate distributions, tR4

represents the regional average L-kurtosis simulated from the Kappa distribution, and β4 denotes the bias created in the process of estimation. The statistic of the

test is approximately normally distributed for sites with observations greater than 20. The fit of the candidate distribution can be declared adequate if the criterion

Z

DIST

≤ 1.64 holds, giving the hypothesized distribution a 90% confidence level.

4.4

Estimation of Frequency Distribution

Finally, the sites are assigned to regions that are nearly homogeneous and a prob-ability distribution will be chosen for fitting to each region’s data. At this point, apart from a scale factor, the frequency distributions of the sites in a region are approximately identical.

4.4.1

Regional L-moment Algorithm

The aim of the regional L-moment algorithm is to fit a single frequency distribu-tion to the sites’ data in a homogeneous region. This frequency distribudistribu-tion should describe the distribution of the observations at each site after scaling by the at-site scaling factor (index flood). In order to estimate quantiles of the at-site frequency distributions, the aforementioned distribution is then scaled appropriately at each site. The distribution is then fitted by the method of L-moments, meaning that the parameters are estimated by comparing the population L-moments of the dis-tribution to the sample L-moments calculated from the data. Furthermore, it is assumed that the frequency distribution at each site has the mean equal to their index flood and that it is estimated at site i by the sample mean of the at-site data. The regional average L-moment ratios, tR, tR3, tR4,..., weighted proportionally to the sites’ record length are then given by

tR=PN

i=1nit(i)/PNi=1ni,

tRr =PN

i=1nit(i)r /

PN

(24)

where ni represents the record length for site i and t(i), t (i) 3 , t

(i)

4 ,... represent the

sample L-moment ratios. The regional average mean (index flood) is then set to 1, lR

1 = 1. The distribution is then fitted by equating its L-moment ratios λ1, τ , τ3,

τ4,..., to the regional average L-moment ratios lR1, tR, t (R) 3 , t

(R)

4 ,..., calculated above.

The estimate of the quantile with non-exceedance probability F at site i is then given by

ˆ

Qi(F ) = l (i) 1 q(F )ˆ

where ˆq(.) represents the quantile function of the fitted regional frequency distribu-tion (Hosking & Wallis, 1997).

4.4.2

Accuracy Measures

Since the results obtained by statistical analysis may be rather uncertain, an assess-ment of the accuracy of the estimated quantiles is carried out. This is required for the results to be maximally useful and credible.

Firstly, the relative root mean square error (RMSE) of the estimators will be approximated through M = 1000 Monte Carlo simulations. The formula is given by

Ri(F ) = " M−1PM m=1 ˆ Q[m]i (F )−Qi(F ) Qi(F ) 2#1/2

where ˆQ[m]i (F ) represents the quantile estimate for the non-exceedance probability F in site i andnQˆ[m]i (F ) − Qi(F )

o

/Qi(F ) represents the relative error of the estimate.

This quantity is then squared and averaged for all M simulations to approach the relative RMSE of the estimators (Hosking & Wallis, 1997).

Thereafter, a 95% confidence interval is approximated, also through M = 1000 Monte Carlo simulations. The formula is given by

ˆ Q(F ) U.025(F ) ≤ Q(F ) ≤ ˆ Q(F ) L.025(F )

where ˆQ(F ) and Qi(F ) represent the estimated and true values of the quantiles

respectively, and the limits ˆQ(F )/U.025(F ) and ˆQ(F )/L.025(F ) represent the 95%

error bounds. Moreover, the accuracy of the error estimates depends on the number of simulations, M. Larger values of M will give more accurate error estimates, re-sulting in the choice of 1000 simulations which is considered to give accurate results (Hosking & Wallis, 1997).

(25)

Chapter 5

Results and Discussion

5.1

Discordant Measures

The first step in regional frequency analysis is to identify the discordant sites. The criterion, explained in section 4.2.1, is that a site should be regarded as discordant if its test value Di > 3. However, a more specific critical value provided by Hosking

and Wallis (1997) is if the region contains precisely 11 sites, a specific site should be regarded as discordant if its test value exceeds 2.632. Table 5.1 displays that none of the sites exhibits any critical discordance. Meaning, that all of the sites are suitable for the regional frequency analysis.

The sites with the highest discordant values were Tomelilla and Vomb while the least discordant site was Trelleborg. Since the discordant measure is based on a comparison between regional and at-site L-moments, large at-site deviations are a major cause of the discordance. By looking at the kurtosis column of table 5.1, the value of Tomelilla is the most deviant, which is why its value of Di is the highest.

Furthermore, the standard deviation of precipitation of the sites are relatively high and, in general, almost a third of their respective mean. Ystad has the highest variability of rainfall followed by Klippan, and the lowest deviation is attributable to Osby.

One of the main reasons of why the 11 sites failed to be considered as discor-dant is their average rainfall. The third column of Table 5.1 displays the at-site mean rainfall for the specific sites where it is evident that they all display relatively similar values. Another site characteristic that determines precipitation is average temperature. Table 2.1 displays a compact temperature interval ranging from 6.56 to 9.06 degrees Celsius, meaning that the whole province exhibits similar tempera-tures. Furthermore, the altitude of the sites ranges from 5 to 85 metres, where the lowest elevation is shared by Landskrona, Malm¨o and Trelleborg. Moreover, it is analytically difficult to find any interpretable correlation as of why some sites return similar discordant values. There are no clear patterns in positioning or altitude of the sites which in return would display similarities in discordance. However, it is not to be ruled out that the aforementioned factors do not contribute to any discor-dance since there might be more latent factors such as steam flows, wind patterns, and other hydrological occurrences that could affect the frequency and magnitude of rainfall in Sk˚ane.

(26)

Since Sweden, in general, does not usually get affected by extreme natural disas-ters, it was expected that none of the sites would show any discordance. No outliers or shifts of means were found for the sites either. Furthermore, at-site statistics were used in the discordance test and not at-site characteristics. This could have influenced the test scores since the site characteristics did not necessarily yield equiv-alent rainfall frequencies. Hence, it was more reliable to use site statistics rather than characteristics such as average temperature or altitude.

Descriptive Statistics of Sites

Site Name n Mean Median S CV Skewness Kurtosis Di

Brom¨olla 58 33.48621 31.45 10.54621 .16685 .21720 .21609 1.33 Klippan 75 36.27200 32.50 14.81738 .20507 .32668 .24160 1.12 Kristianstad 128 33.21641 31.35 12.83804 .21717 .19866 .18354 .60 Landskrona 75 31.91467 29.00 11.14497 .21399 .19290 .11368 .68 Lund 154 32.07662 29.10 11.74141 .19415 .26115 .15434 .63 Malm¨o 90 32.74444 30.00 13.98872 .21938 .28525 .22246 .85 Osby 66 30.68939 28.95 9.684152 .16660 .24532 .20977 1.32 Tomelilla 61 33.44262 32.40 10.17552 .17232 .16432 .05478 1.87 Trelleborg 75 31.34533 29.00 11.85418 .19437 .27408 .22000 .20 Vomb 75 34.58000 32.90 11.39020 .18177 .16160 .15407 1.54 Ystad 65 33.25846 29.70 14.98905 .21905 .29344 .23547 .85

Table 5.1: At-site descriptive results together with L-moments and discordance of the 11 sites.

(27)

5.2

Heterogeneity Measures

The heterogeneity measures that were calculated using Monte Carlo simulations were each based on different measures of between-site dispersion of L-moment ratios; i) weighted standard deviation of L-CVs, ii) average of L-CV/L-skewness distances, and iii) average of L-skewness/L-kurtosis distances. However, since a standardized test value of H1 < 1 was observed, the measure based on weighted standard deviation

of L-CVs was deemed to be sufficient (Hosking & Wallis, 1997). The results are presented in table 5.2.

Since the test value of H1 was observed to be -.12, the condition of H1 < 1 is

fulfilled and the region is regarded as “acceptably homogeneous” (Hosking & Wallis,

1997). This is further illustrated in figure B.1 and B.2 where at-site L-skewness is plotted versus L-CV and L-kurtosis respectively. Moreover, the negative value of H1 indicates the dispersion among the at-site sample L-CV values being less than

what would be expected of a homogeneous region with independent at-site frequency distributions. This is most likely caused by positive correlation between the data values at different sites. However, since H1 > −2, there is no indication of large

amounts of cross-correlation between the sites’ frequency distribution or that there is excessive regularity in the data that could cause the sample L-CVs to be unusually close together (Hosking & Wallis,1997). Hence, further examination of the data was deemed not to be necessary.

Since the province of Sk˚ane is rather similar in observable site characteristics, the between-site dispersion of the L-moments tend to be small. For larger regions, e.g. the whole of Sweden, the weather might be far more different due to different geographical and hydrological features. This would result in difficulties in finding homogeneous regions and would usually require subdivisioning and clustering. As mentioned in section 5.1, the average precipitation or altitude of the sites display similarities and is one of the main reasons why the whole province could be regarded as homogeneous. There are presumably more underlying factors that contributes to the frequency and magnitude of rainfall, i.e. affect the values of the L-moments. However, for the convenience of the study these have not been further examined.

According to theory provided by Hosking and Wallis (1997), the H-statistics should be considered as a guideline for accepting a region as homogeneous. Since bias may occur in the process of data collection, the thresholds mentioned earlier better holds if observations are independent for any given site and if the actual underlying distribution is the Kappa distribution. This was not the case, however, by using the regional approach, the H-statistic is less affected and the estimation of quantiles for the accepted region is considered as robust and will still yield acceptable values.

Between-Site Dispersion of L-moment Ratios ti

Obs. S.D. of Group Mean .0162

Sim. Mean of S.D of Group Mean L-CV .0167 Sim. S.D of S.D of Group L-CV .0039 Standardized Test Value H1 -.12

(28)

5.3

Goodness-of-fit Measures

The execution of the goodness-of-fit test is similar to that used in the Heterogeneity measure. The L-moments of the individual sites in a homogeneous region are equiv-alent to the regional average, meaning that their scale and location parameters are used to test how well they fit the regional average mean and L-CV. The goodness-of-fit will consequently be judged by how well the distributions match the observed data.

Table 5.3 presents the statistics of the five aforementioned distributions used in the test. The (**) indicate the distributions that satisfy the null-hypothesis; that the fitted distributions do not differ from the regional fitted distribution and will be a candidate to use for quantile estimation. The Generalized Extreme Value distribution along with the three-parameter Generalized Normal distribution were found to be consistent with the data. The values of L-kurtosis for the GEV and GNO distributions are respectively .1883 and .1695. The simulated homogeneous region were based on 1000 Monte Carlo simulations. It was made from the four-parameter Kappa distribution where its estimated four-parameter values of the location, scale, shape, and an additional parameter “ˆh” were respectively .8197, .2545, -.1030, and .0388. The three-parameter estimates for the GEV distribution are .825, .249, -.111. Correspondingly, the GNO distribution generated the estimated parameter values of .916, .308, and -.506. A comparison between all the hypothesised distribu-tions is presented in table 5.3. It is evident that the GEV and GNO distribution have parameter estimates closest to that of the Kappa distribution. The ZDIST-statistics of the GEV and GNO distributions are respectively .04 and -.87. These distributions are the only candidate distributions to satisfy

Z

DIST

≤ 1.64 with corresponding

p-values of .9681 and .3843 respectively. The p-value indicates the probability of a candidate distribution to fit the data. Based on the results, it is concluded that the GEV distribution is the most consistent with the maximum annual daily rainfall data in Sk˚ane and will therefore yield the most accurate quantile estimates.

Goodness-Of-Fit

Distribution L-Kurt Z-Score P-value µˆ αˆ ˆk GLO .2162 2.11 .0348 (.924) (.175) (-.243) GEV .1838 .04 ** .9681 (.825) (.249) (-.111) GNO .1695 -.87 ** .3843 (.916) (.308) (-.506) PE3 .1434 -2.54 .1111 (1.000) (.366) (1.468) GPA .1032 -5.10 < .0001 (.570) (.521) (.215)

Table 5.3: Goodness-of-fit tests using 1000 Monte Carlo simulations. ˆµ denotes the location, ˆα denotes the scale, and ˆk denotes the shape parameter.

(29)

Figure 5.1 illustrates the growth curves of the candidate distributions plotted against L-kurtosis and L-skewness on the y and x -axis respectively. The red point in the middle represents the mean of the regional L-moment ratios simulated from the Kappa distribution. The geometrical points on the right side of the graph represent the two-parameter special cases of each candidate distribution where “E” denotes ”Extreme Value”, “N” denotes ”Normal”, “U” denotes ”Uniform”, “G” denotes ”Gamma” and, “L” denotes ”Log-Normal distribution”. Illustrated in the figure, the growth curve of the GEV distribution lies the closest to the regional mean, which further suggest its suitability for the region.

A reason as to why the GEV distribution is the most consistent with the data may lie within the properties of the distribution. As mentioned in section 3.3, the GEV distribution fits both positive and negative values of the the tails and is applicable for extreme values. The estimated shape parameter of the Kappa distribution used in the simulations is negative. This parameter is also negative for the GLO and GNO distribution, however, the closest fit is held by the GEV distribution. Both location and scale parameters of the GEV distribution are positive and lie closest to those of the Kappa distribution, which further strengthens its suitability.

As mentioned in section 4.4.1, the Z-statistic is approximately normally dis-tributed when calculating the regional average L-kurtosis, tR4, only if the sites within the region contain more than 20 observations. If, however, this criterion fails to hold, the Z-statistic will not be exactly normally distributed and the critical value to de-termine the fit of each candidate distribution becomes indeterminable. Furthermore, the standard deviation of the simulated L-kurtosis usually becomes small, which dis-closes false indications of the inference (Hosking & Wallis, 1997). In the case of this study, it is fair to say that this particular problem has to some extent been avoided. However, the Z-statistic may not be reliable if cross-correlation is present in the data along with at-site observations being identically distributed. As presented in table 5.2, the H1-statistic was slightly negative which indicated that there might be

some cross-correlation present in the data. However, since H1 > −2, it was deemed

that the cross-correlation was small enough to not affect the robustness of the test. Additionally, the sites have different record lengths and therefore can not be con-sidered as identically distributed (Hosking & Wallis, 1997). However, the problem that arises is less crucial when using the regional approach since the region as a whole is more robust to the at-site data discrepancies. Hence, the Z-statistic can be considered as valid.

(30)

ss

(31)

5.4

Quantile Estimates and Accuracy Measures

The quantile estimation is based on the two probability distributions that were the most consistent with the data, i.e. the Generalized Extreme Value and the Generalized Normal distributions. The formulas used in the estimation are found in figure A.2 and by using the quantile function F (yT), the probabilities of

non-exceedance of precipitation for the return periods 1-, 5-, 10-, 50-, and 100-years are estimated. Table 5.4 presents the at-site quantile estimates calculated from the mean precipitation of each site multiplied with the regional average quantiles in order to obtain the magnitude of non-exceedance precipitation. Table 5.4 presents the results calculated explicitly for the GEV distribution. The table exhibits the fitted values and relative RMSE for each return period together with their 95% lower and upper confidence bounds. The first result column reflects a 1% probability of the extreme rainfall magnitude not to exceed the fitted value within one year, respectively for each site. Furthermore, as longer return periods are measured based on the GEV distribution, the variability tend to increase. The estimated return period of a 100 years for the sites have a much larger fitted non-exceedance magnitude with a more widely spread confidence bound, which advocates a larger span of uncertainty than for the first return period. Moreover, the last estimated return period explains that there is a 99% probability of precipitation not exceeding the fitted level within a 100 years. The site with the lowest estimated quantile for all return periods was Osby and the highest quantile for all periods was Klippan. The highest relative RMSE for the last return period was generated by Klippan while Lund generated the lowest.

Table B.3 exhibits the at-site quantiles estimated with their respective 95% con-fidence bounds and relative RMSE calculated using the next best-fit distribution, the GNO distribution. The lowest and highest fitted quantile estimate for all re-turn periods were still Osby and Klippan respectively, as with the GEV distribution. Theory provided by Hosking and Wallis (1997) still holds in the results meaning that the variability and hence the confidence intervals are greater for all periods using the GNO relative the GEV distribution. In addition, table A.1 shows 0-, 25-, 50-, 75-, and 100% quantiles for each individual site. The values displayed are, however, not based on different return periods calculated from a certain distribution with non-exceedance probability. It is merely summary statistics based on the observed precipitation of each site.

The regional relative RMSE, displayed in table B.4, was calculated for the in-tended return periods using both the GEV and GNO distributions. Common for both distributions is that the accuracy for the first period is by far lower than all the remaining periods, which seemed anomalous. The same findings were made by Hosking and Wallis (1997) using data from the study made by Plantico et al. (1990). However, a causal effect of this anomaly was not found. The values of the relative RMSE are affected by the amount of correlation within the region. The relative regional bias measures fitted quantiles when the estimated distribution does not match the upper and lower tail of the underlying distribution. This underlying bias could be one of the reasons as of why the relative RMSE is very high for the first return period. Furthermore, the regional relative RMSE gradually increases until the end of the prediction horizon after the return period of 5 years. This is in line with what would be expected since greater uncertainty in predictions is associated with subsequent return periods.

(32)

The at-site relative RMSE for the GEV and GNO distribution are presented in table 5.4 and B.3 respectively. The estimates generated from the GEV distribution exhibits lower RMSE for the first return period, only to increase gradually through the remaining periods. The accuracy of the first return period of the GNO distri-bution exhibits a different pattern. The at-site relative RMSE is relatively high in the first period before decreasing in the second period and then gradually increasing through the remaining periods. This further suggests the suitability of the GEV dis-tribution relative the GNO disdis-tribution. Figure B.3 and B.4 illustrates the regional growth curves for the GEV and GNO distributions respectively. The dotted lines represents the upper and lower error bounds. Moreover, figure B.3 and B.4 give a graphical idea of table B.4. The bounds in figure B.4 displayed having concave and convex properties, with a wider span for the first and last period while narrowing around the return period of 5 years. Figure B.3 displays slight tendencies of these properties, however, not nearly as much. This is due to the at-site relative RMSE, calculated for the GEV distribution, that gradually increases for every subsequent year measured. Another theoretical reason provided by Hosking and Wallis (1997) as to why these bounds are greater for the first period is due to sampling variability. The variability causes the sample L-moment ratios to scatter more than the regional ones and may originate from heterogeneity in the region. The error bounds of the growth curves are affected by the sampling variability which causes them to become more widespread.

In most statistical analyzes, the deviation often increases the further the pre-diction horizon is. Anticipating return periods with at-site number of observations less than the years estimated will yield lower confidence, especially when the years increase. The only sites with record lengths greater than a hundred years are Kris-tianstad and Lund, the rest contained less which yields greater uncertainty in pre-diction for the longer return periods. As stated by Hosking and Wallis (1997), the confidence bounds are only valid if the distribution of ˆQ(F )/Q(F ) is independent of all parameters involved when specifying the underlying model and if the observations are identically distributed. The parameters of the regional L-moment algorithm are the L-moment ratios and at-site means. Further, the assumptions do rarely hold in practice and neither in this study. It means that the confidence bounds should be interpreted as approximates. However, the bounds are based on 1000 simulations and by using the regional frequency approach, the accuracy increases making the error bounds smaller. Besides increasing the number of observations for the gauged sites, using the best-fit probability distribution when estimating the non-exceedance for the return periods will decrease the variability of the quantile estimates. Since the Generalized Extreme Value distribution was the most consistent with the data, it consequently yields the most accurate estimates for the return periods relative to the other candidate distributions.

By using the information above, policy and city planners can more easily base their decisions for a certain span of time by using the maximum annual daily rain-fall estimated for a given return period. Thereof, damage and destruction can be minimized and water resource management and infrastructure planning can be more efficient. A subsequent step, not included in this study would be the estimation of extreme rainfall for ungauged sites. This has been an occurring difficulty in statis-tical hydrology. This study is limited to estimating quantiles for gauged sites, more specifically sites where actual data can be observed. If homogeneous regions can

(33)

be found, applying the procedures used in this study facilitates the estimation of sites with data discrepancies. A common method for estimation, utilized by Hus-sain (2017) among others, is the use of linear regression. It often includes at-site characteristics such as average rainfall and elevation as independent variables and L-moment mean as the dependent for the regression. Further, the use of R2

adj

to-gether with p-values is efficient to determine the best at-site characteristics to use. By applying the best-fit distribution of the region, the regression can be coupled with the quantile estimates in order to more precisely estimate extreme rainfall for ungauged sites.

(34)

At-Site Quantile Estimates (GEV) Site Statistic 1% Y ear 1 80% Y ear 5 90% Y ear 10 98% Y ear 50 99% Y ear 100 Brom¨olla Lower

F it U pper RM SE 13.06 15.41 18.27 1.822 40.41 41.49 42.69 2.263 47.12 49.35 51.71 3.021 62.91 68.99 75.19 5.473 70.15 78.39 86.85 6.845 Klippan Lower F it U pper RM SE 14.16 16.70 19.80 1.909 43.77 44.94 46.24 2.246 51.04 53.45 56.01 3.039 68.15 74.73 81.45 5.649 75.99 84.91 94.08 7.179 Kristianstad Lower F it U pper RM SE 12.96 15.29 18.13 1.673 40.08 41.15 42.34 1.576 46.74 48.95 51.29 2.258 62.41 68.44 74.59 4.550 69.59 77.76 81.15 5.901 Landskrona Lower F it U pper RM SE 12.45 14.69 17.42 1.759 38.51 39.54 40.68 1.862 44.91 47.03 49.28 2.542 59.73 65.76 71.67 4.750 66.86 74.71 82.77 6.035 Lund Lower F it U pper RM SE 12.51 14.76 17.50 1.456 38.70 39.74 40.89 1.420 45.14 47.27 49.53 2.051 60.27 66.09 72.03 4.153 67.20 75.09 83.19 5.540 Malm¨o Lower F it U pper RM SE 12.77 15.07 17.87 1.637 39.51 40.57 41.74 1.797 46.07 48.25 50.56 2.470 61.53 67.47 73.54 4.171 68.60 76.65 84.92 6.047 Osby Lower F it U pper RM SE 11.98 14.13 16.75 1.674 37.03 38.02 39.12 1.957 43.18 45.22 47.39 2.635 57.66 63.23 68.92 4.882 64.29 71.84 79.59 6.048 Tomelilla Lower F it U pper RM SE 13.05 15.39 18.25 1.853 40.35 41.43 42.63 2.242 47.06 49.28 51.64 2.970 62.83 68.90 75.10 5.332 70.06 78.29 86.74 6.707 Trelleborg Lower F it U pper RM SE 12.23 14.43 17.11 1.661 37.83 38.84 39.96 1.896 44.11 46.19 48.40 2.581 58.89 64.58 70.39 4.824 65.67 73.38 81.30 6.136 Vomb Lower F it U pper RM SE 13.50 15.92 18.88 1.811 41.72 42.84 44.08 2.177 48.66 50.96 53.40 2.979 64.98 71.25 77.66 5.580 72.45 80.95 89.69 7.077 Ystad Lower F it U pper RM SE 12.98 15.31 18.15 1.758 40.13 41.21 42.40 2.175 46.80 49.01 51.36 2.959 62.49 68.53 74.69 5.444 69.67 77.85 86.25 6.872 Table 5.4: At-site quantile estimates, ˆQi(F ) = l

(i)

1 q(F ), with 95% confidence boundsˆ

(35)

Chapter 6

Conclusion

In this study, the annual maximum daily rainfall of 11 gauging sites in Sk˚ane, Sweden, was analyzed. The aim of the study was to i) identify homogeneous regions among the 11 sites, ii) find the best-fit probability distribution for the regions, iii) estimate the magnitude of rainfall at each site, and iv) assess the accuracy of the estimates. Regional rainfall frequency analysis was performed to identify the best-fit distributions and the method of L-moments was applied to estimate the parameters of the distributions.

First, the discordance measure, Di, was calculated for each site to identify those

sites that were discordant with the group as a whole. It was found that none of the sites should be regarded as discordant and consequently all of the sites were deemed to be suitable for the regional frequency analysis. Subsequently, the heterogeneity measures, Hi, were calculated to estimate the degree of heterogeneity in the group

of sites. The test value of H1 was observed to be -.12. Thus, empirical evidence was

found to conclude that all of the sites could be treated as one homogeneous region. To find the best-fit distribution of the region, goodness-of-fit tests were performed using 1000 Monte Carlo simulations. The candidate distributions used in the tests were the GLO, GEV, GNO, PE3, and GPA distributions. The GEV and GNO distributions were the only distributions that satisfied

Z

DIST

≤ 1.64 and generated

p-values of .9681 and .3843 respectively. Thus, it was concluded that the GEV distribution was the most consistent distribution for the maximum annual daily rainfall data in the region, followed by the GNO distribution.

These two distributions were then used in the process of quantile estimation. The non-exceedance magnitude of the return periods 1-, 5-, 10,- 50-, and 100-years was estimated. It was found that the GEV distribution generated the most accurate results with narrower confidence bounds and lower at-site relative RMSE compared to the GNO distribution. Furthermore, the fitted magnitude of non-exceedance increased for every return period being measured for both distributions. It was concluded that the GEV distribution generated the most accurate quantile estimates and should therefore be a preferable model in estimation of annual extreme rainfall in the province of Sk˚ane.

This study touches the subject of extreme rainfall anticipation and its approaches for a more correct estimation. By finding accurate models, extreme precipitation can be measured and city planning among other types of constructions may be enhanced. As for further research, the study of ungauged sites can be made using these methods.

Figure

Figure 2.1: Geographical location of the province and the 11 gauged sites.
Table 2.1: At-site summary statistics for the 11 Sites.
Table 5.1: At-site descriptive results together with L-moments and discordance of the 11 sites.
Table 5.2: Descriptive statistics of L-CV together with the H 1 -statistic.
+7

References

Related documents

Syftet eller förväntan med denna rapport är inte heller att kunna ”mäta” effekter kvantita- tivt, utan att med huvudsakligt fokus på output och resultat i eller från

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Det finns en risk att samhället i sin strävan efter kostnadseffektivitet i och med kortsiktiga utsläppsmål ’går vilse’ när det kommer till den mera svåra, men lika

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

• Utbildningsnivåerna i Sveriges FA-regioner varierar kraftigt. I Stockholm har 46 procent av de sysselsatta eftergymnasial utbildning, medan samma andel i Dorotea endast

Utvärderingen omfattar fyra huvudsakliga områden som bedöms vara viktiga för att upp- dragen – och strategin – ska ha avsedd effekt: potentialen att bidra till måluppfyllelse,

Den förbättrade tillgängligheten berör framför allt boende i områden med en mycket hög eller hög tillgänglighet till tätorter, men även antalet personer med längre än