• No results found

Validation of a Public Transport Model

N/A
N/A
Protected

Academic year: 2021

Share "Validation of a Public Transport Model"

Copied!
96
0
0

Loading.... (view fulltext now)

Full text

(1)

,

STOCKHOLM SWEDEN 2020

Validation of a Public

Transport Model

YOUSEF AHO

JOHANNES DE JONG

(2)
(3)

Validation of a Public Transport

Model

YOUSEF AHO

JOHANNES DE JONG

Degree Projects in Systems Engineering (30 ECTS credits)

Master’s Programme in Industrial Engineering and Management (120 credits) KTH Royal Institute of Technology year 2020

Supervisor at Trafikförvaltningen: Erik Almlöf Supervisor at KTH: Per Enqvist

(4)

TRITA-SCI-GRU 2020:239 MAT-E 2020:066

Royal Institute of Technology

School of Engineering Sciences KTH SCI

SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

(5)

During 2018, the Public Transport Administration (Trafikf¨orvaltningen) in the Stockholm region spent approximately 2.2 billion SEK on new infrastructure investments related to the public transport system, many of which were based on their public transport models. The previously used method for validating these models has lacked scientific rigour, efficiency and a systematic approach, which has led to uncertainty in decision making. Furthermore, few scientific studies have been conducted to develop validation methodologies for large-scale models, such as public transport models. For these reasons, a scientific validation methodology for public transport models has been developed in this thesis. This validation methodology has been applied on the 2014 route assignment model used by Trafikf¨orvaltningen, for the transport modes bus, commuter train and local tram.

In the developed validation methodology, the selected validation metrics called MAPE, %RMSE and R2are used to compare link loads from a route assignment

model with observed link loads from an Automatic Passenger Counting (APC) system. To obtain an overview of the performance of the route assignment model, eight different scenarios are set, based on whether the validation metrics meet acceptable thresholds or not.

In the application of the developed validation methodology, the average link loads for the morning rush have been validated. To adjust the developed val-idation methodology to system-specific factors and to set acceptable metric thresholds, discussions with model practitioners have taken place. The val-idation has been performed on both lines and links, and for bus entire line number series have been validated as well. The validation results show that commuter train meets the set threshold values in a higher proportion than bus and local tram do. However, Trafikf¨orvaltningen is recommended to further cali-brate the route assignment model in order to achieve a better model performance. The developed validation methodology can be used for validation of public transport models, and can in combination with model calibration be used in an iterative process to fine-tune model parameters for optimising validation results. Finally, a number of recommendations are proposed for Trafikf¨orvaltningen to increase the efficiency and quality of the validation process, such as synchronising model data with the observed data.

(6)
(7)

infrastrukturinvesteringar f¨or kollektivtrafiksystemet i Stockholm, varav m˚anga av dessa baserades p˚a deras kollektivtrafikmodeller. Den tidigare metoden f¨or att valideras dessa modeller har saknat gedigen vetenskaplig grund, effektivitet och ett systematiskt tillv¨agag˚angss¨att, vilket lett till os¨akerhet g¨allande investerings-beslut. Dessutom har f˚a vetenskapliga studier genomf¨orts f¨or att ta fram valid-eringsmetodologier f¨or storskaliga modeller, s˚asom kollektivtrafikmodeller. Av dessa sk¨al har en vetenskaplig valideringsmetodologi f¨or kollektivtrafikmodeller tagits fram i detta examensarbete. Denna valideringsmetodologi har till¨ampats p˚a Trafikf¨orvaltningens 2014 ˚ars n¨atutl¨aggningsmodell, f¨or trafikslagen buss, pendelt˚ag och sp˚arv¨ag.

I den framtagna valideringsmetodologin har de valda valideringsm˚atten vid namn MAPE, %RMSE och R2 anv¨ants f¨or att j¨amf¨ora l¨ankbelastningar fr˚an en n¨atutl¨aggningsmodell med observerade l¨ankbelastningar fr˚an ett Automatisk Trafikantr¨akning-system (ATR). F¨or att ge en ¨oversikt ¨over modellens precision har ˚atta scenarios satts baserat p˚a om valideringsm˚atten godk¨anns eller inte enligt tr¨oskelv¨arden.

I till¨ampningen av den framtagna valideringsmetodologin har de genomsnit-tliga l¨ankbelastningarna f¨or morgonens rusningstrafik validerats. F¨or att justera den framtagna valideringsmetodologin efter systemspecifika faktorer och f¨or att s¨atta godk¨anda tr¨oskelv¨arden f¨or valideringsm˚atten, har diskussioner med trafikanalytiker h˚allits. Valideringen har utf¨orts b˚ade p˚a linjer och l¨ankar, och f¨or buss har ¨aven hela linjeserier validerats. Valideringsresultaten f¨or pendelt˚ag har en h¨ogre andel godk¨anda m¨atningar ¨an buss och sp˚arv¨ag. Trafikf¨orvaltningen rekommenderas dock att kalibrera n¨atutl¨aggningsmodellen ytterligare f¨or att uppn˚a ett b¨attre resultat.

Den framtagna valideringsmetodologin kan anv¨andas f¨or valideringar av kollek-tivtrafikmodeller, och kan i kombination med modellkalibrering anv¨andas i en iterativ process f¨or att finjustera modellparametrar och d¨armed optimera valider-ingsresultaten. Slutligen f¨oresl˚as ett antal rekommendationer f¨or Trafikf¨ orvaltnin-gen f¨or att ¨oka effektiviteten och kvaliteten p˚a valideringsprocessen, till exempel att synkronisera modelldata med observerad data.

(8)
(9)

First and foremost, we want to thank our supervisor at Trafikf¨orvaltningen, Erik Alml¨of, for guidance and support. Gerasimos Loutos, Zafeira Gkioulou and the analysts at Strategic Development at Trafikf¨orvaltningen for their valuable and detailed input. Ivana Rodriguez Ewerl¨of from Resen¨arsservice at Trafikf¨ orvalt-ningen for providing us with data and contributing to the methodology. Lastly, we would like to thank our supervisor at KTH Royal Institute of Technology, Per Enqvist for guiding us in the process.

(10)
(11)

1 Introduction 1

1.1 Background . . . 1

1.2 Research Aim . . . 2

1.3 Purpose . . . 2

1.4 Transportation Planning Process . . . 2

1.4.1 Four Step Demand Model . . . 2

1.4.2 Modelling Systems . . . 4

1.5 Related Work . . . 4

1.5.1 Public Transport Administration . . . 4

1.5.2 Swedish Transport Administration . . . 5

1.6 Delimitations . . . 6

1.7 Thesis Outline . . . 6

2 Model Validation 7 2.1 Quantitative Validation . . . 9

2.1.1 Validation Metrics . . . 9

2.1.2 Examples of Case Studies . . . 12

2.2 Qualitative Validation . . . 14 2.2.1 Expert Interviews . . . 15 2.2.2 Travel Surveys . . . 15 2.3 Evaluation of Methods . . . 15 2.3.1 Quantitative Validation . . . 16 2.3.2 Qualitative Validation . . . 21 3 Method 22 3.1 Choice of Validation Metrics . . . 22

3.2 Model Data . . . 22

3.3 Observed Data Sources . . . 23

3.4 Data Management . . . 24 3.4.1 Model data . . . 24 3.4.2 Observed Data . . . 25 3.5 Validation Process . . . 26 3.6 Data Summary . . . 28 3.6.1 Model Data . . . 28

(12)

3.6.2 Observed Data . . . 29

3.7 Thresholds and Scenarios . . . 29

4 Results 31 4.1 Examples of Scenarios . . . 31 4.2 Bus . . . 35 4.3 Commuter Train . . . 39 4.4 Local Tram . . . 42 5 Discussion 46 5.1 Choice of Validation Metrics . . . 46

5.2 Scenarios . . . 47

5.2.1 Possible Explanations . . . 47

5.2.2 Implementation . . . 48

5.3 Results . . . 49

5.3.1 Comparison Between Transport Modes . . . 51

5.3.2 Thresholds . . . 51

5.3.3 Possible Data Errors . . . 52

5.4 Evaluation of Data Sources . . . 52

5.4.1 Automatic Passenger Counting (APC) . . . 53

5.4.2 Automatic Fare Collection (AFC) . . . 53

5.4.3 Manual Counting Studies (MTS) . . . 54

5.5 Implementation . . . 54

5.5.1 Level of Analysis . . . 55

5.6 Limitations . . . 56

5.7 Further Research . . . 56

5.7.1 Validation of Four Step Demand Model . . . 56

5.7.2 Potential Future Data Sources . . . 57

5.7.3 Model Quality . . . 57

6 Conclusion 59

Bibliography 61

Appendices 65

A Validation Results for Bus on Line Level 65 B Validation Results for Aggregated Bus Links 75

(13)

1.1 Illustration of the four step demand model. . . 3

2.1 Illustration of a model development process. . . 7

2.2 Example of the R2 with and without using a line of best fit. . . . 19

4.1 Example of eight different scenarios on line level for bus. . . 32

4.2 All link load measurements for bus. . . 35

4.3 All link load measurements for commuter train. . . 39

(14)

2.1 Summary of some properties of the validation metrics. . . 20

3.1 Bus line operational area by number series. . . 28

3.2 Threshold for used validation metrics. . . 30

3.3 Validation metric criteria for the different scenarios. . . 30

4.1 Examples of given scenarios. . . 31

4.2 Data summary for validation of bus. . . 35

4.3 Summary of the validation results per aggregation for bus. . . 36

4.4 Validation of bus on group level. . . 37

4.5 Ratios of scenarios for bus for the different aggregations. . . 38

4.6 Data summary for validation of commuter train. . . 39

4.7 Summary of the validation results per aggregation for commuter train. . . 40

4.8 Validation of commuter train on line level. . . 41

4.9 Validation of commuter train for the largest 10 aggregated links. 41 4.10 Ratios of scenarios for commuter train for the different aggregations. 42 4.11 Data summary for validation of local tram. . . 42

4.12 Summary of the validation results per aggregation for local tram. 43 4.13 Validation of local tram on line level. . . 44

4.14 Ratios of scenarios for local tram. . . 45

5.1 Possible explanations for metric threshold acceptance and non-acceptance. . . 48

A.1 Validation of bus on line level. . . 66

(15)

Introduction

1.1.

Background

Model validation is defined as the process in which the accuracy of a model’s representation of a target system is determined. [4] It can also be defined as the process in which a model’s output is systematically compared to independent real world observations. [11] Model validation is a key aspect in the process of model building since it determines the model’s degree of usefulness and its accuracy. [20] Despite the importance of model validation, there are few studies that have been conducted to develop validation methodologies for large-scale models, such as public transport models. [3]

The Public Transport Administration (Trafikf¨orvaltningen) develop infrastruc-ture and public transport in Stockholm, Sweden. Trafikf¨orvaltningen uses a four step demand model to describe the current demand and forecast the future demand of the public transport system in the Stockholm region. The four steps are: trip generation, trip distribution, mode choice and route assignment. [16] The four steps are implemented using the national transport modelling sys-tem Sampers, which is used to estimate the travel demand per mode (the first three steps), and PTV Visum, which is used to model the route assignment step (how passengers traverse the public transport system).

Trafikf¨orvaltningen has developed a model that is based on the year 2017, which is the “current” model. Moreover, a variety of models have been developed for the purpose of forecasting travel demand during year 2030 and 2050. The models are used as a decision tool to, for instance, determine the size of invest-ments. Until now, the process of validation of the “current” model has not been performed in a systematic and scientific way. Methodologically, a comparison between the number of boarding and alighting passengers at key stations and stops in the city center has been performed, for the morning rush and the 24-hour

(16)

average. The results of this process have been difficult to interpret and it has been difficult to obtain an overview of the results. Moreover, it is not evident that the number of boarding and alighting passengers is the best way to determine that the model is correctly representing reality.

The lack of scientific validation methods in regards to public transport sys-tems leads to uncertainty in the decision-making processes. [5] During 2018, Trafikf¨orvaltningen spent approximately 2.2 billion SEK on new infrastructure investments, many of which were based on the different models. [24] In order to increase the certainty in the decision-making process and consequently, the cost-efficiency and quality of the investments for the public transport system, it is necessary to understand how accurately the models are representing real-ity. Because of this, there is value in implementing a scientific, detailed and large-scale validation methodology that provides interpretable results.

1.2.

Research Aim

The aim of this master thesis is to document an overview of current scientific methods of model validation within and outside the public transport domain. Also, the aim is to develop and document a methodology that can be implemented and used effectively by Trafikf¨orvaltningen.

1.3.

Purpose

This master thesis has a threefold purpose. Primarily, the purpose is to contribute to the scientific body of knowledge regarding methods to validate public transport models. Secondarily, it is to determine the accuracy of the model used by Trafikf¨orvaltningen. Thirdly, it is to evaluate the different data sources that Trafikf¨orvaltningen has access to.

1.4.

Transportation Planning Process

1.4.1.

Four Step Demand Model

The four step demand model is recognised as the main tool for determining the current demand and forecasting the future demand of a given transport system. It is also used to determine the performance of the system. The model is designed to be used at a national, regional and sub-regional level and its main purpose is to evaluate projects related to infrastructure. [16] An illustration of the four step demand model and the interactions between the different steps is provided in Figure 1.1. [27]

(17)

Figure 1.1: Illustration of the four step demand model.

Trip Generation

Trip generation is the first step of the four step demand model. In this step, a measure that represents the frequency of trips conducted is generated. The purpose of this step is to determine the volume of the travels in the system. [16] Trip Distribution

In trip distribution, the second step, all possible links between each trip origin and trip destination are created. In this step, a link is defined as a feasible route connecting the trip origin with the trip destination. Moreover, trip tables are generated to represent all possible links between each trip origin and trip destination and the corresponding impedance (time and/or cost) of choosing a specific link. [16]

Mode Choice

In mode choice (also called modal split), which is the third step, the trip tables generated in the second step are used to construct mode specific trip tables. A mode refers to a transport mode, which refers to a way of performing a transport. The tables are used to reflect the proportion of trips conducted using a specific mode of transport, i.e., bus or commuter train. [16]

Route Assignment

In the final step, route assignment (also called transit assignment), the mode specific trip tables generated in the third step are assigned to mode-specific networks. [16]

(18)

1.4.2.

Modelling Systems

To perform the steps trip generation, trip distribution and mode choice, a travel demand model is used with input data regarding population, households and employment. In combination with this, a route assignment model is used. The specific modelling systems, Sampers and PTV Visum, used by Trafikf¨orvaltningen in the Stockholm region are described below.

Sampers

Sampers is the travel demand modelling system used for prognoses on national and regional level. It aims to, in a qualified and systematic way, forecast travel demand for different modes of transport such as air, rail, road, and walking and biking. The input data comes from travel surveys, with parameters such as level of infrastructure, macroeconomic factors and public transport fares. In Sampers the demand is calculated for each travel mode. [25]

PTV Visum

PTV Visum is one of the largest commercial route assignment modelling softwares. It is used in the Stockholm region to assign routes based on the input from Sampers. The PTV Visum model developed by Trafikf¨orvaltningen to describe the current model year, is based on the time of the year when the maximum numbers of trips are done in the public transport system. This usually occurs during the period September to November. From PTV Visum, passenger data can be extracted in different formats, in aggregated form or broken down all the way to boarding, alighting and link load data for each station/stop, mode of transport and route. [29]

1.5.

Related Work

Trafikf¨orvaltningen and the Swedish Transport Administration (Trafikverket) are continuously working on validating their models with the purpose of under-standing and calibrating them. Previously conducted work on model validation by Trafikf¨orvaltningen and Trafikverket is described below.

1.5.1.

Public Transport Administration

In the most relevant report on validation from Trafikf¨orvaltningen, observed data from the Automatic Passenger Counting (APC) system and observed data from manual counting studies (MTS) is compared to the 2014 route assignment model in PTV Visum, based on input from the travel demand models in SIMS and Sampers. SIMS is an older model, developed in the late 1980’s based on a traffic habit survey. It has historically been seen as superior in handling congestion issues that are specific for Stockholm. In recent years, Sampers

(19)

has been gradually accepted as a superior model in the Stockholm region. An-other route assignment software, Emme, is used in An-other regions, but not in the Stockholm region due to underdevelopment of the route network in the public transport system in Stockholm. The validation is done for the transport modes bus, commuter train, subway and local tram. The validation is performed on two measurement periods: the morning rush hour and the 24-hour average. [1] In the report, the values from the models in PTV Visum/Sampers and in PTV Visum/SIMS and the observed data are illustrated graphically, highlight-ing the deviations. The relative percentage deviation is measured for certain stations/stops. In general, the more aggregated comparisons regarding the total number of passengers on each traffic mode shows a low deviation between the observed and the modelled values. One exception from this is the subway, where the observed 24-hour average of boarding passengers differ with almost 20% from the modelled values. On a station/stop level, the deviations show a higher variance, with most comparisons having a low deviation, but certain stations, especially central ones, have large deviations between the observed and modelled values. These deviations are explained in most cases, but the low certainty of these conclusions are noted in the report. In general, underestimations on one station/stop are offset by an overestimation on another station/stop, implying an error in the route assignment model rather than the travel demand model. Causes for this are often rooted in passenger’s travel preferences, such as station/stop comfort or wanting to avoid congestion, that are difficult to incorporate into PTV Visum and Sampers. [1]

1.5.2.

Swedish Transport Administration

In a report from Trafikverket, an example of a validation process using Sampers as the modelling system is shown. In this report, Trafikverket uses model results from the base year 2014, using updated statistics from the 2016 and 2018 version of the model. These versions are relevant since statistics are updated long after the actual year. [26]

An attempt is made to validate the number of people crossing a hypotheti-cal line going through Stockholm (”SM-snittet”), and entering the toll zone surrounding inner Stockholm using private transport as well as public transport. A comparison is made both with the 2016 and the 2018 version of the 2014 base year, showing deviations between the modelled values and the observed val-ues. Unfortunately, the transparency regarding the validation method is low. [26] In addition to presenting the comparisons broken down by mode of transport, a regression analysis is done on the observations in the road network in the Stockholm region and in the city of Stockholm. The model is validated using the R2 with a line of best fit, which measures the degree of variance in the observed values that can be explained by the linear relationship. The results show a strong correlation between the modelled and observed values. [26]

(20)

1.6.

Delimitations

In this master thesis, the focus is on operational validation of a route assignment model. Thus, verification of the computerized model and its implementation are not the primary focus. The travel demand model and its underlying assumptions along with its interactions with the route assignment model are not validated. Since public transport investments are mainly based on information regarding the link load capacity, the link load is validated instead of the number of boarding and alighting passengers. The validation of the model used by Trafikf¨orvaltningen excludes the transport mode subway and boat due to lack of reliable and extensive link load data.

1.7.

Thesis Outline

The remainder of this thesis is structured accordingly: Chapter 2 provides a detailed overview of model validation with focus on quantitative and qualitative validation. The chapter presents the literature review on validation methods along with an evaluation of the methods. Chapter 3 introduces the validation methodology of this thesis and outlines the process of the performed validation. Chapter 4 presents the validation results and Chapter 5 presents a discussion regarding the results, the chosen validation methodology and its implementation along with a discussion regarding further research. Finally, the conclusions are presented in Chapter 6.

(21)

Model Validation

Validation and verification is an integral part of the model development process, which is shown in Figure 2.1. [20] This process and the different elements is discussed below.

(22)

The problem entity is the system to be modelled (existing or proposed). The conceptual model is a representation of the problem entity that is developed for a specific purpose. The computerized model is an implementation of the conceptual model.

Through analysis and modelling, a conceptual model is built, partly based on assumptions about the problem entity. The conceptual model is validated through determining that its underlying assumptions are correct and reason-able. The validation of the computerized model is done by assuring no mistakes were made in the programming and implementation of the conceptual model. Finally, operational validation is performed to validate that the computerized model represents reality, that is, the problem entity. Operational validation is defined as ensuring that the output of the model falls within an accepted range of accuracy for the specific use and purpose of the model. This is often done through experimentation.

In all steps, and for all models, data validity is needed. Data validity is the process of ensuring that input data for model building, model evaluation and for conducting experiments are correct. [20] Since part of the purpose of this thesis is to determine the accuracy of a computerized model’s representation of reality, the methodological focus of this thesis is mainly on operational validation. Who develops the validation methodology makes a vital difference when validat-ing a model. The most frequently used case is validation by the developers of the model. This is often done as a part of the model development process, and is performed based on the input from the developers themselves. However, this induces significant risk, as the tests and validation metrics designed are likely to mirror the design of the system itself, and can therefore disregard certain sections or aspects of the system. Therefore, one of the following two approaches is recommended. [20]

One approach is to let the users of the system develop a methodology to validate the outputs. This is then done in close cooperation with the model developers. In this approach, the validity of the model must be confirmed not only on the aspects that the developers intended for usage, but also on the aspects actually used in practice, which are wider in general. [20]

A second and preferred approach, is to use a third party independent of both the development team and the users, to develop the validation methodology. When this approach is used, it is vital that the third party has a thorough understanding of the system that is to be validated. Because of this, third party validation often involves a higher cost and is done primarily for large-scale systems with high complexity. In these types of systems, it is difficult and time-consuming to accurately validate all aspects of a system as a developer or user. When using third party validation, the third party can perform tests with no or little apparent correlation with the tests performed by the developers and users.

(23)

This often increases the quality and credibility of the validation methodology. [20] However, determining that a model is completely valid is often considered as being excessively costly and time consuming. Because of this, model valida-tion is often performed by conducting a limited number of different tests and evaluations. This is performed until a satisfactory level of confidence in the model is achieved. If the result from an evaluation indicates that the model is inaccurate for the given experimental conditions, the model is considered to be invalid. Furthermore, determining that the model has a satisfactory level of accuracy for all tests conducted does not imply that the model is valid in its domain of usage. [20]

2.1.

Quantitative Validation

2.1.1.

Validation Metrics

One way of validating a model quantitatively is to measure its accuracy using validation metrics. Using only graphical comparison, also known as graphical validation, is not considered to be fully sufficient since it can be difficult to identify model output errors and model uncertainties on a detailed level. Fur-thermore, it makes comparison between different outputs difficult. Because of this, quantitative validation requires the usage of statistical methods and various validations metrics to quantify the model’s accuracy. [17] Validation metrics provide information regarding the correspondence between the modelled values and the observed values. Currently, there is no universal standard for which validation metric to use. [13] Thus, an outline of validation metrics, from various fields, is provided below. The notation m denotes the modelled value and the notation c denotes the observed value.

The Mean Absolute Percentage Error (MAPE) is often used when wanting to determine the accuracy of multiple modelled values. It is defined as

MAPE = 1 n · n X i=1 |ci− mi| |ci| , (2.1)

where ci represents the observed value and mi denotes the modelled value for

data point i. The number of data points is denoted with n. [23]

For a single value comparison measure, the Absolute Percentage Error (APE) is used. It is defined as

APE = |c − m|

|c| , (2.2) where c represents the observed value and m denotes the modelled value. Both the MAPE and the APE measures relative deviation. [8]

(24)

The Mean Absolute Deviation (MAD) is a measure of the absolute deviations between the observed values and the modelled values. It is defined as

MAD = 1 n· n X i=1 |ci− mi|, (2.3)

where ci denotes the observed value and mi denotes the modelled value for data

point i. The number of data points is denoted with n. [19, p. 212]

For a single value comparison, the Absolute Deviation (AD) is used. It is defined as

AD = |c − m|, (2.4) where c represents the observed value and m denotes the modelled value. The MAD and the AD measures absolute deviation. [8]

When validating transport models, a common metric to use is the GEH. It is a single value measure that quantifies the deviation between the observed value and the modelled value. This metric takes into account both the absolute and relative deviation and is often used for assessing hourly traffic volumes. It is defined as

GEH = r

2 · (m − c)2

m + c , (2.5) where c denotes the observed value and m denotes the modelled value. A GEH value less than 5 is considered to be acceptable. A value between 5 and 10 indicates the need for further investigation while a value greater than 10 implies inaccuracy. [8]

The Mean-Square-Error (MSE) is a measure of the squared deviations between the observed values and the modelled values. It is defined as

MSE = 1 n· n X i=1 (ci− mi)2, (2.6)

where ci denotes the observed value and mi denotes the modelled value for data

point i. The number of data points is n. [9, p. 29-30]

The Root-Mean-Square-Error (RMSE) is the square root of the MSE. It is defined as RMSE = v u u t 1 n· n X i=1 (ci− mi)2= √ MSE, (2.7) where ci denotes the observed value and mi denotes the modelled value for data

(25)

A common variation of the RMSE that takes the size of the observed values into account is the %RMSE. It is defined as

%RMSE = q 1 n · Pn i=1(ci− mi)2 1 n · Pn i=1ci = RMSE ¯ c , (2.8) where ci denotes the observed value and mi denotes the modelled value for

data point i. The number of data points is n. The average of the observed values is denoted with ¯c. A value less than 30% is considered to be acceptable. [18] Another metric, Theil’s inequality coefficient (Theil’s U), is defined as

Theil’s U = q 1 n· Pn i=1(ci− mi) 2 q 1 n· Pn i=1(ci)2+ q 1 n · Pn i=1(mi)2 ∈ [0, 1], (2.9)

where ci denotes the observed value and mi denotes the modelled value for data

point i. The number of data points is denoted with n. The model is said to have perfect forecasting ability if the value of Theil’s U is equal to 0. In practice, a value less than 0.5 is considered to be good and a value less than 0.1 is considered to be excellent. [19, p. 213-214]

The Pearson correlation coefficient (Cor) is a statistical metric that denotes the linear relationship between the data sets M and C. It is defined as

Cor(M, C) = Pn i=1(mi− ¯m) · (ci− ¯c) q Pn i=1(mi− ¯m) 2 · q Pn i=1(ci− ¯c) 2 ∈ [−1, 1], (2.10)

where ¯m denotes the average of the set M and ¯c denotes the average of the set C. The modelled value is denoted with mi and the observed value is denoted with

ci for data point i. The number of data points is denoted with n. [9, p. 70-71]

A value of 1 reflects a perfect positive correlation. A value of 0 indicates no correlation and a value of -1 reflects a perfect negative correlation. In transport modelling, a correlation over 0.70 is considered to be sufficient while a correlation over 0.85 is considered to be very sufficient. [19, p. 187-188]

The coefficient of determination (R2) denotes the proportion of the total variance of the observed values that can be explained by the corresponding modelled values. It is defined as R2= 1 − Pn i=1(ci− mi) 2 Pn i=1(ci− ¯c) 2 , (2.11)

where ci denotes the observed value and mi denotes the modelled value for data

point i. The average of the observed values is denoted with ¯c. The number of data points is n. For transport models, a value greater than 0.5 is considered

(26)

to be acceptable, a value greater than 0.75 is considered to be sufficient and a value greater than 0.85 to be very sufficient. [19, p. 188-190] In the linear regression setting, the R2 is equal to the squared Pearson correlation coefficient in the presence of only one independent variable, and it has a lower bound of 0 and an upper bound of 1.

2.1.2.

Examples of Case Studies

The above listed validation metrics are commonly used when validating transport models and other statistical models. In this section, an overview of the various validation metrics and methods used by different governments and various actors is given. The case of Sweden is excluded since it was briefed in Section 1.5. Copenhagen, Denmark

In Copenhagen, a validation of the route assignment model was done, comparing data from travel surveys as well as manual on-vehicle counts with the outputs from the model for the years 2000-2004. The validation metrics used were the %RMSE and the GEH. In addition, the R2value was measured, using a reference

line where the observed values equal the modelled values. [30]

An acceptable limit was set for the GEH to 5, and a threshold for the number of observations that fall below this limit was set. The GEH was seen to give good results at low passenger volumes. For the %RMSE the results differed significantly between differently sized measurements, and a discussion was carried about regarding the reasons for this. The %RMSE also had high discrepancies at large volumes. Due to these perceived weaknesses, the R2 was measured

to complement the results. This provided insight to systemic biases (over- or underestimation). [30]

Oslo, Norway

In a technical documentation, the Norwegian Public Roads Administration describes the validation and calibration process of their transport model, which is based on the four step demand model. A validation of the transport model was done based on the results from a national travel survey conducted in the year 2009. It is stated that potential deviations between the modelled and observed values can occur due to errors in the implementation of the four step demand model. Because of this, a validation is usually performed on all four steps. In Norway, the APE is used as a validation metric to obtain a percentage deviation. In addition, the GEH is used. This metric is used because it takes into account both the relative and absolute deviation. A GEH value less than 5 is considered to be very sufficient, while a value less than 10 is considered to be acceptable. Values larger than 10 needs further investigations. However, a large transport model, such as this one, is often complex and contains many links. It is possible that the acceptable thresholds could not be satisfied for every link simultaneously.

(27)

Thus, it is stated that at least 85% of the links should satisfy the acceptable thresholds. [28]

United Kingdom

In 2014, the Department for Transport in the United Kingdom released a Trans-port Analysis Guidance (TAG) destined for model practitioners. It is stated that the validation process of the public transport model should include validation of trip tables, network and service validation and assignment validation. It is recommended that validation of assignment, services and trip-tables is performed by comparing modelled values with observed values using the APE. It is recom-mended that across modelled screenlines, the modelled passenger flow should, in total, not deviate more than 15% from the observed values. The modelled flows should, on individual links, not deviate more than 25% from the observed values, except when the observed passenger flows is low, i.e. 150 passengers per hour. [7]

When validating the link loads in the highway assignment model, the Department for Transport recommends that the metrics MAPE, MAD and GEH are used. The metrics are considered to be consistent and modelled flows that satisfies the acceptable thresholds are regarded as sufficient. For the GEH, a value less than 5 in more than 85% of the cases is considered acceptable. For flows less than 700 vehicles/hour, the MAD should not be greater than 100 vehicles/hour in more than 85% of the cases. For flows between 700 vehicles/hour and 2700 vehicles/hour, the percentage error should not be greater than 15% in more than 85% of the cases. Lastly, for flows greater than 2700 vehicles/hour, the MAD should not be greater than 400 vehicles/hour in more than 85% of the cases. [6] South East Queensland, Australia

The calibration and validation methodology for a route assignment model in South East Queensland, Australia is outlined in a recent report. Here, AFC (Automatic Fare Collection) data from smart cards are compared to outputs from a route assignment and a travel demand model, where both origin-destination matrices and route assignments are validated. The %RMSE is used as a valida-tion metric. In combinavalida-tion with graphical validavalida-tion, a regression analysis using the R2as a metric is also performed in the calibration process, where additional validation is done. [22]

In the proposed methodology framework, the calibration and validation pro-cedure is iterative. This is made possible due to the quick feedback process in collecting new data from the AFC system that can be used to calibrate parameters in the route assignment model. In this case, a big advantage is the two-point system for AFC with both a check-in and a check-out, meaning that the ticket is validated both when boarding a vehicle or entering a station and when alighting a vehicle or exiting a station. This gives detailed and accurate

(28)

insights to mode choices and exact routes. [22] New Orleans, USA

In New Orleans, Louisiana, the regional transport model was validated. The purpose of the proposed validation process was to ensure that the modelled value had a sufficient level of correspondence with the observed values. It was found that, among the discussed validation metrics the best validation metric to use, in the case of Louisiana, was the R2. The R2 value was calculated, using a

reference line where the observed values equal the modelled values. The choice of validation metrics was based on the fact that these metrics are not sensitive to small variations, which can be expected in larger transport models. [3] Tennessee, USA

In a report, guidelines for validation of the highway assignment model for the state of Tennessee are stated. The report outlines an application of the guidelines regarding vehicle transport. The travel demand model used in Tennessee is based on the four step demand model. The model is validated by comparing outputs from the Census Transportation Planning Package (CTPP) and a household travel survey to ground counts of passing vehicles. The metrics used to determine the accuracy of the model are the APE, the Pearson correlation coefficient and the %RMSE. The APE is also used in a form where a functional split is used based on the road size. Moreover, the %RMSE is used with a split on average traffic per day on a link. It is noted that the smaller the road size is, the higher the variation is. [32]

In addition, the deviation between the modelled and observed total distance travelled is measured. This gives an aggregated measure of the accuracy of the model. Some agencies have focused on measurements during the peak hour instead of the daily average. The APE is used in these cases. It is suggested that the model in Tennessee is validated using all the above mentioned metrics. However, no explanation for why a certain metric should be used is given. For these validation metrics, it is proposed that the correlation should be greater than 0.88, the %RMSE should be less than 30% and the percentage deviation should not exceed 10%. [32]

2.2.

Qualitative Validation

There is a general consensus in the transport community that validation needs to be performed with a specific purpose: the same validation methodology can not be used to validate every aspect of a model. [31] [20] With this in mind, it is important that the actor validating a model has a thorough understanding of the system to be validated in order to use the right validation metrics and data sources. [20] Understanding the system and the underlying concepts and

(29)

assumptions is generally seen as parts of the conceptual model verification process.

2.2.1.

Expert Interviews

Conducting expert interviews is a commonly used qualitative method in model validation. Performing interviews is often an important component in the ver-ification process, though usually not the main part. They are often seen as a tool to reduce model uncertainty. [2] The interviewees may or may not be involved in the model development process, depending on the size of the system being modelled. The interviews should preferably be done with an audience as wide as possible, to avoid biased opinions on what to validate. In addition, it is important to be aware of previous model validation methods and possible incentive structures that may give a skewed view. The interviews should seek to give a better understanding of the system itself and the model, as well as understanding the process of collecting data that should be validated against. Expert interviews can also be used in the calibration process, to understand the causal relationships with the presumably erroneous input variables and the specific deviations in the validation process. In smaller systems, this may also be an iterative process, where hypotheses are generated by experts and are then confirmed through modelling and validation.

2.2.2.

Travel Surveys

Another qualitative method used in the validation process is travel surveys. Travel surveys are often used to, for instance, confirm the underlying assumptions in the travel demand model. These assumptions could be related to passenger behaviour and socioeconomic factors, and relates primarily to the first three steps in the four step model. Moreover, travel surveys can be used as a way to collect actual observations. The results from the surveys can then be used to validate the model by comparing the modelled values with the observed values. However, in many cases the demand models themselves are based on travel surveys of some kind, and are a questionable source for validation. [30] For example, reporting biases would not be reflected in the validation.

2.3.

Evaluation of Methods

In the previous section, an overview of validation methods were given. The validation methods possess different advantages and disadvantages. It is im-portant that these, and the context in which the model is used in, are taken into consideration when choosing a validation method. Due to the low number of large-scale implementations of validation and calibration methodologies, the advantages and disadvantages of using a specific method in a specific situation with a specific context is still unclear. In this section, an evaluation of the above

(30)

mentioned quantitative and qualitative validation methods is presented. The focus is on the validation metrics.

2.3.1.

Quantitative Validation

The MAPE is one of the most used validation metrics. It is a metric that provides a relatively easy interpretation, i.e., the mean of the absolute percentage errors. Moreover, it is a scale-independent metric, i.e. the measured value does not change when scaling the data with a constant. [12] As a result, the metric can be used to compare the model’s accuracy between different data sets. A key characteristic of a good accuracy metric is that it is scale-independent. [15] Another property of this metric is that it is symmetric to the observed value, meaning that a positive deviation from the observed value yields the same metric value as a negative deviation of the same size. In addition, the metric has no unit, which is desirable as it allows for a fair comparison between different models and data sets. [8] However, the metric imposes a significant disadvantage that needs to be considered, i.e. that it returns undefined or large values if the observed values are zero or close to zero. [12]

Moreover, equal absolute error yields different absolute percentage error de-pending on the size of the observed value. [15] For example, if c = 100 and m = 150, the APE becomes

APE = |c − m| |c| =

|100 − 150|

|100| = 50%. If, on the other hand, c = 150 and m = 100, the APE becomes

APE = |c − m| |c| =

|150 − 100|

|150| ≈ 33%.

The MAD is, like the MAPE, a metric that provides a relatively intuitive interpre-tation, i.e. the mean of the absolute deviations. However, it is a scale-dependent metric. [12] A property of the MAD is that it is symmetrical to the observed value. [8] However, since the MAD can range from 0 to positive infinity, it be-comes difficult to interpret the model’s accuracy for data sets with high variance. The GEH can be considered as a normalised version of the APE since it divides the error with the average between the observed and modelled value. [3] Unlike the APE, given an equal absolute error, the GEH does not yield different results depending on the size of the observed value. For example, if m = 150 and c = 100, the GEH becomes

GEH = r 2 · (m − c)2 m + c = r 2 · (150 − 100)2 150 + 100 ≈ 4.47. If m = 100 and c = 150, the GEH becomes

(31)

GEH = r 2 · (m − c)2 m + c = r 2 · (100 − 150)2 100 + 150 ≈ 4.47.

However, one problematic property of the GEH is that a lower GEH value is obtained when the modelled value is an overestimation of the observed value compared to if the modelled value is an underestimation of the observed value by the same amount. This implies that the metric is not symmetric to the observed value. [8] For instance, if m = 90 and c = 100, the GEH becomes

GEH = r 2 · (m − c)2 m + c = r 2 · (90 − 100)2 90 + 100 ≈ 1.03. If, on the other hand, m = 110 and c = 100, the GEH becomes

GEH = r 2 · (m − c)2 m + c = r 2 · (110 − 100)2 110 + 100 ≈ 0.98.

Furthermore, the GEH is a metric that is scale-dependent. That is, the accept-able threshold value needs to be adjusted based on the size of the observations. For instance, the threshold for evaluating hourly traffic volume might not be suitable for evaluating daily traffic volume. Moreover, the GEH has a unit, and like the MAD, it can range from 0 to positive infinity.

The GEH has specific advantages when used to validate travel demand models. In traffic planning, highly trafficked routes are often more vital to dimension correctly as both costs for infrastructure investments and consequences of con-gestion increase exponentially with scale. A relative deviation of approximately 50% might be acceptable for smaller routes, while unacceptable for larger ones. In addition, in public transport, large routes often function as connection points, allowing faulty estimates to propagate in the system. [8]

The MSE is recognised as one of the most commonly used validation met-ric in statistical modelling in general. [10] However, like the MAD and the GEH, the MSE is a scale-dependent metric. [12] Moreover, outliers have a greater effect on the MSE compared to the MAD since it is defined as the mean of the squared deviations. Thus, one large error can result in a MSE value that underestimate the accuracy of the model. On the other hand, if all the errors are small or less than 1, the MSE can overestimate the accuracy of the model. This makes the MSE less reliable than other mentioned metrics. [21] Due to these disadvantages, the MSE is relatively difficult to interpret. The RMSE is sometimes preferred over the MSE since it is on the same scale as the data and thus easier to interpret. [10] The %RMSE is a commonly used metric due to its sensitivity to large absolute deviations between the observed values and the modelled values. Furthermore, it provides a relatively intuitive interpretation and is scale-independent. The %RMSE is recommended by the Federal Highway Administration for validating link loads in travel demand modelling. [18] However, the metric does not always

(32)

yield a value between 0 and 1, which decreases interpretability. In addition, the use of the %RMSE has been challenged on several aspects. One is that it holds much stricter requirements on larger measurements. [30] Another is that it does not capture biases in the data but only deviations, thus in order to understand if an over- or underestimation has been done, other metrics must be used. Theil’s U is a metric that does not provide a relatively intuitive interpreta-tion compared to the other meninterpreta-tioned metrics, which may explain why it has not been widely used in practice. Like the MAD, the MSE and the RMSE, large deviations can distort this metric. [12] However, it is able to capture the bias, correlation and variance between the data sets. [3] Thus, the metric provides a powerful measure that encapsulates the above mentioned issues, which are considered to be relevant in transport modelling. Moreover, Theil’s U provides a value between 0 and 1. [19, p. 213-214]

The R2 is, as mentioned before, equal to the squared correlation in the lin-ear regression setting in the presence of only one independent variable. However, since the focus is to validate the model’s accuracy by measuring the deviation between the modelled values and the observed values, the R2, in this case, denotes the proportion of the total variance of the observed values that can be explained by the corresponding modelled values, using a reference line where the observed values equals the modelled values. Since the aim is not to develop a line of best fit between the modelled and observed values, the total squared deviation of the observed values from the modelled values can exceed the total variance of the observed values. It is therefore possible to obtain a negative R2

value. Furthermore, since the metric depends on the variance of the observed values, a low variance in the observed values can still yield a low R2value despite

small deviations between the observed and modelled values. This effect increases as the number of data points decreases.

In Figure 2.2, an example of an R2 value obtained with and without using a line of best fit is shown. The red dashed line indicates a line of best fit while the black line indicates a line where the modelled values equal the observed values.

(33)

Figure 2.2: Example of the R2 with and without using a line of best fit.

The R2value obtained using the line of best fit indicates that the model performs relatively well. However, in this example, it is noted that the model yields values that are an underestimation of the observed values. Thus, an R2 value based on

the black line is a more desirable and accurate way of measuring the accuracy of the model.

Moreover, the R2 is a metric that is not greatly sensitive to small variations

in the data, which can be expected in large-scale public transport models. [3] Furthermore, the metric provides a relatively intuitive interpretation. Because of this, the R2is a common metric used in practice to validate transport models. [8]

Like the R2, the Pearson correlation coefficient is a metric that is relatively

easy to interpret. In addition, the metric is scale-independent and has no unit. However, its application in model validation is limited since it only stipulates the degree of the linear relationships between the observed and modelled values. On the other hand, this property increase the interpretability of the metric. A summary of some of the properties of the validation metrics is given in Table 2.1. Note that some properties of the validation metrics can be considered as subjective.

(34)

CHAPTER 2. MODEL V ALID A TION

Property MAPE MAD GEH MSE RMSE %RMSE Theil’s U Cor R2

Scale-independent 2     2 2 2 2 Symmetric to the observed value 2 2  2 2 2   2 Meaningful unit or no unit 2 2   2 2 2 2 2 Value between 0 and 1       2  a Relatively easy to interpret 2 2   2 2  2 2 Used in practice 2  2   2   2

a

In the non regression setting.

(35)

2.3.2.

Qualitative Validation

For the specific purpose of evaluating public transport models, expert knowledge is needed. Though some elements of modelled transport systems in different geographic areas might overlap, they are still idiosyncratic in many aspects. Therefore, expert knowledge given through interviews must be an integrated part in large-scale validations.

Travel surveys are often used as input to the first steps in the four step model, and can also be used for validation. The accuracy of the surveys are often low, and can at best be used to validate origin-destination matrices or aggregated mode choice estimations. In route assignment, however, much more detailed data is needed, and travel surveys often fail to provide this amount of detail.

(36)

Method

Part of the aim of this thesis is to develop a validation methodology that can be used to validate public transport models. Moreover, part of the purpose of this thesis is to determine the accuracy of the model used by Trafikf¨orvaltningen. In this chapter, the method to fulfill the above mentioned is outlined.

3.1.

Choice of Validation Metrics

The choice of validation metrics for validating large-scale models, such as the public transport model used by Trafikf¨orvaltningen, requires the consideration of several aspects. In order to choose the most suitable validation metrics, a thorough analysis of their properties and ease of interpretation has been done. The relative importance of the theoretical properties from Table 2.1 have been evaluated based on the different data sets provided by Trafikf¨orvaltningen. A qualitative analysis has also been performed with the model practitioners at Trafikf¨orvaltningen. Their opinions have been taken into consideration when evaluating the relative importance of the theoretical properties of the metrics. From the qualitative analysis, it was desired that the metric should have a high interpretability, identify systematic over- or underestimations, easily iden-tify outliers, and capture deviations in individual values. No single metric could fulfill all these requirements, and therefore the following validation metrics were chosen: MAPE, %RMSE and R2. However, all validation metrics in Table 2.1

were implemented to allow for comparison.

3.2.

Model Data

The model data used for the validation was taken from PTV Visum, the route assignment software used by Trafikf¨orvaltningen in the Stockholm region. The names of each node was extracted, along with the number of alighting passengers,

(37)

the number of boarding passengers, and the through load (the number of passengers remaining on the vehicle). The data was extracted from the 2014 model in PTV Visum. The data was given for the morning rush hour that occurs some time between 6 a.m. to 9 a.m. In order to avoid irregularities in the data, days with special events are not considered. In addition, all Saturdays and Sundays as well as time periods where deviations from the main time table occurs are excluded. On these days, the number of traveling passengers differ significantly from weekdays without special events when the main time table is used. The output is the average flow on the included days.

3.3.

Observed Data Sources

There are many different observed data sources that can be used for validating a public transport model. Data sources can capture both node data (stations and stops) and link data (on-vehicle). To validate the model data with observed data, the extraction of the observed data should be modified to follow the same structure as for the model data. Different data sources available in the Stockholm region are described below.

Automatic Passenger Counting (APC)

The APC system is designed to automatically count the number of alighting and boarding passengers. The system also provides information such as time of arrival and departure per station and stop. In Stockholm, the system is available on approximately 25% of all buses, commuter trains and local trams. The APC system is a sophisticated system consisting of sensors, and is equipped with wireless data communication and a vehicle computer. The observed data from the system is processed and added to a database. [14] The number of alighting and boarding passengers is available to extract for every departure, implying that information regarding the number of passengers on the vehicle (link load) can also be obtained.

Automatic Fare Collection (AFC)

Many public transport systems have an AFC system with the purpose of au-tomating ticket validation. In Stockholm, the SL Access-card is used to travel in the public transport system. Travel tickets are loaded onto the card, which are then validated upon boarding a vehicle or entering a station by exposing it to a ticket reader. In the case of the public transport system in Stockholm, data from the AFC system consists only of information regarding when and where the ticket was validated upon boarding a vehicle or entering a station, as tickets are not validated when alighting a vehicle or exiting a station. Thus, no information regarding the travel route of the passengers is available.

(38)

Manual Counting Studies (MTS)

MTS are studies that are performed with the purpose of manually measuring the traffic in the public transport system. These studies are usually performed with tally counters. Different types of data can be collected through MTS, including the number of passengers entering and exiting stations, the number of alighting and boarding passengers on specific vehicles, and the number of passengers currently on the vehicle (link load). Also, the time of arrival and departure for different transport modes at stations and stops can be collected through MTS. [14] The MTS are often conducted at larger stations and stops during selected time periods, thus data from MTS is not as exhaustive as data from the APC or AFC system.

Vehicle Weight

One way of estimating the number of passengers on a given vehicle, is to measure the weight of the vehicle and divide it with an rough measurement of the average weight of a normal sized person. In the Stockholm region, this method of estimating the vehicle occupancy is only available on some subway- and commuter trains with a scale installed.

3.4.

Data Management

In this section, the method used to transform the raw data into a form that can be used for analysis and comparison is described. Both the model data and the observed data has been adjusted. In this thesis, a line link is defined as a link connecting two consecutive stations or stops on a specific line. An aggregated link is defined as the aggregation of all line links connecting the same two consecutive stations or stops.

3.4.1.

Model data

To prepare the data for analysis, certain data preparation steps had to be taken. In PTV Visum, certain names for stations and stops are used, which differ from the official names. Since the official names are used in the APC data, a name mapping from PTV Visum to the names in the APC data was made. The names were identified through exact and approximate matching, as well as checking names against time tables for specific lines. In addition, each line was mapped to its corresponding transport mode by using public information available on the public transport system of Stockholm.

Each row in the model data represents a station/stop on a line in a certain direction for a certain line variation. For each row, the station/stop, the number of boarding and alighting passengers, and the through load is given. For the purpose of obtaining the number of passengers travelling on each link, the model data was adjusted to include the name of the previous station/stop. This was

(39)

done by numbering each station/stop on a line. To obtain the number of people travelling on a specific line link and line variation, the number of people boarding on the previous station/stop was added to the through load on the previous station/stop. This was done to allow comparison with the observed link load data. To obtain the total link load for each line link, the values for each direction on a specific line link were aggregated. For the first station/stop on each route, the link load was set to 0 and the previous station name to ”Unknown”, following the structure of the APC data. Finally, in order to obtain data for the whole time period from 6 a.m. to 9 a.m., all values were scaled with a factor of 2, which is an approximation used by Trafikf¨orvaltningen.

Bus

For the transport mode bus, the different variations are denoted with ”A”, ”C” or ”E” for one direction and ”B”,”D” or ”F” for the other direction, at the end of the line number. These variations can cover only a part of the main route. The different variations were aggregated by direction to give the total number of people travelling on each line link and from each stop correctly. In addition, variations can skip certain stops on the route or be a combination of lines. The skip-stops routes are denoted with an ”X”, ”Y” or ”City” and the line combinations are denoted with both line numbers connected with a slash. Commuter train

For commuter train, the different lines are denoted with a ”P” followed by abbreviations of the first and last station, such as ”P Upp-Tum”, representing the commuter train route from the station Uppsala C to Tumba. The corresponding line numbers were identified through publicly available information. Since commuter trains have longer dwell time at key stations to adjust for timetable deviations, these extra times are modelled as separate line links, from the station to itself. To adjust for this, the number of boarding passengers and alighting passengers were aggregated for the previous and current station. The number of alighting passengers on the current station were subtracted from the through load on the previous station. The extra line links were then removed.

Local tram

For local tram, the lines are denoted by the type of tram, such as ”RB” (Roslags-banan) or ”SB” (Saltsj¨obanan), followed by abbreviations of the first and last station. The line numbers for local tram were identified in the same way as for commuter train. Moreover, due to construction work during 2014, line 27 and 28 were consolidated to line 27/28.

3.4.2.

Observed Data

The data source used for observed data was the APC data, which was available for the transport modes, bus, commuter train and local tram.

(40)

APC

The APC data was exported for the period 2014-09-01 to 2014-11-31, measuring the daily average values for the number of boarding and alighting passengers and the link load for the time interval from 6 a.m. to 9 a.m. Since timetables vary slightly in this period, the main timetable (with the highest number of APC observations) was selected for the analysis. The data consisted of link data for each line link and its specific variation (with different start- and/or endpoints). It also contained node data with the number of boarding and alighting passengers for the current stop.

Since only about 25% of the departures were measured with APC, the val-ues were linearly scaled to obtain an estimate of the total valval-ues. The scaling factor for each link and station/stop was calculated as the number of total departures divided by the number of departures where an APC measurement had been registered.

To obtain the daily averages for bus for the time interval 6 a.m. to 9 a.m., the values were first scaled for each line link and half hour interval of departure as described above. The time intervals were then aggregated.

To obtain the daily averages for the time interval from 6 a.m. to 9 a.m. for the transport mode commuter train and local tram, the average values for each departure were first aggregated based on the measurements from each wagon in the train, as they are registered as separate units. These individual wagon measurements were then aggregated. However, in some cases, measurements were only available for some of the wagons, and the total train measurement was then scaled. The scaling factor was calculated as the total number of planned wagons divided by the number of wagons with a registered APC measurement. Finally, the average value per departure was multiplied with the number of planned departures in the time period.

With a similar procedure as was done for local tram in the model data, line 27 and 28 in the observed data were consolidated to line 27/28.

As the observed data included variations, the number of boarding passengers, alighting passengers and the link loads were aggregated for each line link in order to obtain unique line links. This was done for all transport modes.

3.5.

Validation Process

To validate the model used by Trafikf¨orvaltningen, the validation metrics were implemented. The validation was performed on the transport modes bus, com-muter train and local tram. To identify the corresponding values in the model data and the observed data, an exact match with a unique identifier was

(41)

cre-ated. This identifier consisted of the name of the previous stop, the current stop, and the line number. This was done for all transport modes. The unique identifier was then used to create a consolidated file with both modelled and observed values. In addition, a non-unique identifier consisting of the name of the previous stop and the current stop was created to analyse the aggregated links. To give an indication of the reliability and accuracy of the data, the num-ber of line links and matching line links were identified for each line. For each line series group, the number of matching lines and complete lines were identified. For each aggregated link, the number of matching line links and complete line links were identified. A matching line/aggregated link is a line/aggregated link with at least one matching line link, and a complete line/aggregated link is a line/aggregated link where all line links match. In addition, the proportion of APC observations was identified for each aggregated link, line and group. For all transport modes, the average link load was validated. For bus, this was done on line level as well as group level, where lines operating in the same geographical area were grouped. The lines that belongs to a certain group and that group’s operational area is given in Table 3.1. For validation on group level, single-value measurement were taken for each line instead of each specific line link. In addition, the link loads for the largest 100 aggregated links based on modelled link load were validated. For the transport mode commuter train, the average link load on line level and the link loads for the largest 10 aggregated links based on modelled link load were validated. For the transport mode local tram, the average link load was validated on line level. No link loads for aggregated links were validated as 96.32% of the aggregated links contained only one line link.

(42)

Table 3.1: Bus line operational area by number series.

Line number series Operational area

1-99 Stockholm city

100-199 Stockholm municipality, outside the custom borders

200-299 Liding¨o municipality

300-399 Eker¨o municipality and M¨alar¨oarna

400-499 Nacka and V¨armd¨o municipalities

500-599 Northern municipalities (Solna, Sundbyberg, Sollentuna,

J¨arf¨alla, Upplands Bro, Upplands V¨asby and Sigtuna)

600-699 Roslagen (including Danderyd, T¨aby, ¨Oster˚aker and

Norrt¨alje municipalities)

700-799 Huddinge, Botkyrka, Salem, S¨odert¨alje and Nykvarn

mu-nicipalities

800-899 Tyres¨o, Haninge and Nyn¨ashamn municipalities

900-999 Local service lines

3.6.

Data Summary

In this section, a brief summary of the model data and the observed data for each transport mode is presented.

3.6.1.

Model Data

Bus

The model data for bus consisted of 14,700 line links, where 10,776 of these links had an exact match with a line link in the observed data. This corresponds to a loss due to matching errors of 26.69% on the number of observations and 18.62% in travel volume based on the modelled values. The number of bus lines in the model data was 388.

Commuter Train

The model data for commuter train consisted of 167 line links, where 166 of these links had an exact match with a line link in the observed data. This corresponds to a loss due to matching errors of 0.60% on the number of observations. The number of commuter train lines in the model data was four.

Local Tram

The model data for local tram consisted of 229 line links, where 170 of these links had an exact match with a line link in the observed data. This corresponds to a loss due to matching errors of 25.76% on the number of observations and 15.24% in travel volume based on the modelled values. The number of local

(43)

tram lines in the model data was seven (line 27 and 28 are consolidated in the validation).

3.6.2.

Observed Data

Bus

For the transport mode bus, the total number of scheduled departures were 6,013,306, of which 4,464,122 were included in the analysis since they took place on a line link with an exact match in the model data. Of these, 627,163 were measured with APC, corresponding to a 14.05% observation rate. These observations were made on 18,307 bus line links. Of these links, 10,776 had an exact match in the model data. These links were distributed over 409 bus lines in the Stockholm region.

Commuter Train

For the transport mode commuter train, the total number of scheduled departures were 20,041, of which 20,001 were included in the analysis since they took place on a line link with an exact match in the model data. Of these, 19,844 were measured with APC, corresponding to a 99.22% observation rate. These observations were made on 167 commuter train line links. Of these links, 166 had an exact match in the model data. These links were distributed over four commuter train lines in the Stockholm region.

Local Tram

For the transport mode local tram, the total number of scheduled departures were 36,082 of which 35,474 were included in the analysis since they took place on a line link with an exact match in the model data. Of these, 28,284 were measured with APC, corresponding to a 79.73% observation rate. These observations were made on 174 local tram line links. Of these links, 169 had an exact match in the model data. These links were distributed over seven local tram lines in the Stockholm region.

3.7.

Thresholds and Scenarios

The use of suitable thresholds for the chosen validation metrics was based on the recommendations provided in Section 2.1.1, the thresholds used in practice, see Section 2.1.2, and the qualitative analysis performed with the model practitioners at Trafikf¨orvaltningen. The thresholds for the chosen metrics are provided in Table 3.2.

(44)

Table 3.2: Threshold for used validation metrics. Threshold

Validation Metric Accepted Sufficient Very sufficient

MAPE < 30% -

-%RMSE < 30% - -R2 > 0.5 > 0.75 > 0.85

Based on whether the validation metrics meet the accepted threshold or not, eight different scenarios can occur. These are presented in Table 3.3. If a stricter threshold is desired for the R2, the threshold for sufficient and very sufficient

may be used.

Table 3.3: Validation metric criteria for the different scenarios. Validation Metric

Scenario MAPE %RMSE R2

1 Accepted Accepted Accepted 2 Not accepted Accepted Accepted 3 Accepted Not accepted Accepted 4 Accepted Accepted Not accepted 5 Not accepted Not accepted Accepted 6 Not accepted Accepted Not accepted 7 Accepted Not accepted Not accepted 8 Not accepted Not accepted Not accepted

Figure

Figure 1.1: Illustration of the four step demand model.
Figure 2.1: Illustration of a model development process.
Figure 2.2: Example of the R 2 with and without using a line of best fit.
Table 3.1: Bus line operational area by number series.
+7

References

Related documents

The West Link is a railway tunnel underneath central Gothenburg which will provide three new stations and connect commuter traffic with transit traffic.. Many commuters will

Using interview material from a comprehensive audit undertaken by the Swedish National Audit Office in 2016, experiences and views of public actors on national, regional and

To motivate the idea of edge-wise immunization consider a social network of organisms sus- ceptible to a disease that is spread via links represented using a graph theoretical

With the decrease in transport time via rail and road between Sweden and Germany, it is likely going to change contemporary freight flows between Sweden and a

The chief analyst argues that the economies of scale benefits are valuable to a certain limit and means that a blockchain platform allowing municipalities direct access

– 77 procent av alla pengar som skick- as till Afghanistan går förbi den regering vars land vi försöker återuppbygga, sa forskaren Hamish Nixon och visade upp några

The course contents are divided into four parts which include: Basic Scientific Theories and Concepts in Intercultural Competence, Swedish Society and its

This research intends to illuminate the concept and application of the public policy named fare -free public transport (FFPT), using the cases of Avesta and Tallinn