Regression-based evaluation of bicycle flow trend estimates

(1)

ScienceDirect

Available online at www.sciencedirect.com

Procedia Computer Science 130 (2018) 518–525

10.1016/j.procs.2018.04.073

Peer-review under responsibility of the Conference Program Chairs.

1877-0509 Available online at www.sciencedirect.com

www.elsevier.com/locate/procedia

The 9th International Conference on Ambient Systems, Networks and Technologies

(ANT 2018)

Regression-based evaluation of bicycle flow trend estimates

Johan Holmgren

a,b,∗

, Gabriel Moltubakk

a

, Jody O’Neill

a

a_{Department of Computer Science and Media Technology, Malmö University, Malmö 205 06, Sweden} b_{K2 (The Swedish Knowledge Centre for Public Transport)}

Abstract

It has been shown in previous research that regression modeling can be used in order to predict the number of bicycles registered by a bicycle counter. To improve the prediction accuracy, it has also been suggested that a long-term trend curve estimate can be incorporated in a regression problem formulation. A long-term trend curve estimate aims to capture those factors that are difficult, or even impossible, to explicitly model as input variables in the regression model. In the current paper, we present a regression-based approach for evaluating long-term trend curve estimates regarding their possibility to improve the regression prediction accuracy of bicycle counter data. We illustrate our approach by applying it on a time series recorded by a bicycle counter in Malmö, Sweden. For the considered data set, our experimental results indicate that a polynomial of degree two, which has been fitted to the time series, gives the best prediction.

c

�2018 The Authors. Published by Elsevier B.V.

Peer-review under responsibility of the Conference Program Chairs. Keywords: Bicycle counter, regression, trend curve, evaluation

1. Introduction

The bicycle has become an important part of urban transport due to its ability to contribute to fast, sustainable, and cost efficient transport. The bicycle contributes to a healthy, active, life style, and the popularity of the bicycle is accentuated by the increase of bicycling that can be observed around the world. In addition, the bicycle has strong advantages when it comes to parking and storage, as it requires relatively small amount of space. Due to the positive effects of bicycling, there is an increasing interest from public authorities to increase the use of the bicycle. However, in order to achieve a modal shift towards bicycling (from motorized transport), it is important to increase the attractiveness of the bicycle. This can be achieved by implementing various types of policy measures, including the construction and improvement of biking infrastructure, such as bicycling lanes and safe parking facilities. Other

initiatives include bicycle sharing systems, which are currently under development around the world1,2_{. Bicycle}

sharing systems enable, for example, fast multimodal passenger transport, where public transport and the bicycle can

∗_{Corresponding author. Tel.: +46-40-665 76 88 ; fax: +46-40-665 76 46.} E-mail address:johan.holmgren@mau.se

1877-0509 c�2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Conference Program Chairs.

The 9th International Conference on Ambient Systems, Networks and Technologies

(ANT 2018)

Regression-based evaluation of bicycle flow trend estimates

Johan Holmgren

a,b,∗

, Gabriel Moltubakk

a

, Jody O’Neill

a

Abstract

c

1. Introduction

The 9th International Conference on Ambient Systems, Networks and Technologies

(ANT 2018)

Regression-based evaluation of bicycle flow trend estimates

Johan Holmgren

a,b,∗

, Gabriel Moltubakk

a

, Jody O’Neill

a

Abstract

c

1. Introduction

2 Holmgren et al. / Procedia Computer Science 00 (2018) 000–000

be combined in an efficient way1. The fast development of electrical bicycles further increases the attractiveness of the bicycle3.

However, to be able to build a transport system that encourages bicycling, it is important to build knowledge about the current bicycle flows, and what factors are involved in the decision-making of potential bicyclists when choosing whether to use the bicycle, utilize some other mode of transport (e.g., car or bus), or to not travel at all. According to Damant-Sirois and El-Geneidy4_{, there are four categories of determinants of bicycling: individual characteristics}

(including age and gender), individual attitudes (including safety perception and pro-environmental attitude), social environment, and the built environment (i.e., bicycle infrastructure). On the short-term perspective, it has been shown that weather plays an important role whether or not to choose the bicycle5_.

Public authorities commonly use bicycle counters, which enable to automatically, and continuously, register the bicycles that pass some strategically chosen points in the traffic network, in order to collect information about the bicycle flows in an urban area. The data produced by a bicycle counter is a time series, where each of the data points in the series corresponds to the number of registered bicycles during a particular time period, for example, an hour or a day. The number of registered bicycles varies over time based on several factors, including the current weather conditions, time of the day, time of the year, and the current interest of the citizens to use the bicycle as a transport mode. In a recent study, Holmgren at al.6

(see also Aspegren & Dahlström7) show how regression can be used in order to quantify how external factors are expected to influence the bicycle traffic flows at a particular point in a traffic network. In particular, they present a regression model that aims to predict the number of bicycles registered by a bicycle counter, using factors such as day of week, season, and weather (temperature and precipitation) as input variables.

In addition to the factors included in the regression model by Holmgren et al.6_{, there also exist other factors that}

influence how the bicycle flows vary over longer periods time; factors that are difficult to grasp and to explicitly model as input variables in a regression model, for example, since they are not quantifiable using existing data. Examples of such factors are the citizens’ general tendency to use the bicycle and larger infrastructural changes that lead to new patterns of movement. An example of the latter, within the region of our study, is the opening of a new railway station in the center of Malmö in 2010, causing major changes in the traveling pattern for many travelers commuting to and from Malmö. In particular, as the traveling patterns change over longer periods of time, the number of bicycles registered by a bicycle counter is also expected to vary. For example, this implies that the number of bicycles that are registered by the bicycle counter a “normal” day might differ significantly from the number of bicycles registered by the same bicycle counter a normal day a few years later. Using this idea, Holmgren et al.6indicate that there is potential to improve the regression accuracy by incorporating a long term trend estimate taken over the time series produced by a bicycle counter. In order to implicitly capture those factors that are expected to influence the bicycle flow, but which are difficult (or undesirable) to model as input variables, they suggest using the deviation from a long-term trend estimate at the bicycle counter instead of using the absolute number of bicycles as target variable. For illustration, see Fig. 1 for an example of a long term trend curve estimate for a bicycle counter data time series.

There are different ways to construct trend curve estimates for the data points in a time series. For example, trend curve estimates can be generated by fitting polynomial functions of various degrees to the data points in the time series. Another approach is to use splines. However, as trend curve estimates vary in quality and it is possible to construct a very large number of trend curve estimates, it is important to be able to accurately evaluate and compare the quality of different trend curve estimates.

In the current paper, which is based on the Bachelor’s thesis of Moltubakk & O’Neill8_{, we suggest how to use}

regression modeling in order to evaluate trend curve estimates for bicycle counter data time series. We formulate the main tasks included in our evaluation approach as a stepwise procedure, including regression model formulation, generation of trend curve estimates, and evaluation using cross validation on a set of chosen regression algorithms. In addition, we illustrate our approach for regression-based evaluation by applying it on a time series recorded by a bicycle counter in Malmö, Sweden.

Our work aims to provide input for passenger transport analysis models used by city and transport planners, e.g., for assessing the impact of transport policy measures. The relevance in this direction is emphasized by the fact that bicycling is currently being incorporated in passenger transport analysis models around the world. As mentioned above, we present an evaluation method for assessing the quality of long-term trend curve estimates on a time series

(2)

Johan Holmgren et al. / Procedia Computer Science 130 (2018) 518–525 519

The 9th International Conference on Ambient Systems, Networks and Technologies

(ANT 2018)

Regression-based evaluation of bicycle flow trend estimates

Johan Holmgren

a,b,∗

, Gabriel Moltubakk

a

, Jody O’Neill

a

Abstract

c

1. Introduction

∗ _{Corresponding author. Tel.: +46-40-665 76 88 ; fax: +46-40-665 76 46.} E-mail address:johan.holmgren@mau.se

The 9th International Conference on Ambient Systems, Networks and Technologies

(ANT 2018)

Regression-based evaluation of bicycle flow trend estimates

Johan Holmgren

a,b,∗

, Gabriel Moltubakk

a

, Jody O’Neill

a

Abstract

c

1. Introduction

The 9th International Conference on Ambient Systems, Networks and Technologies

(ANT 2018)

Regression-based evaluation of bicycle flow trend estimates

Johan Holmgren

a,b,∗

, Gabriel Moltubakk

a

, Jody O’Neill

a

Abstract

c

1. Introduction

be combined in an efficient way1. The fast development of electrical bicycles further increases the attractiveness of the bicycle3.

However, to be able to build a transport system that encourages bicycling, it is important to build knowledge about the current bicycle flows, and what factors are involved in the decision-making of potential bicyclists when choosing whether to use the bicycle, utilize some other mode of transport (e.g., car or bus), or to not travel at all. According to Damant-Sirois and El-Geneidy4_{, there are four categories of determinants of bicycling: individual characteristics}

(including age and gender), individual attitudes (including safety perception and pro-environmental attitude), social environment, and the built environment (i.e., bicycle infrastructure). On the short-term perspective, it has been shown that weather plays an important role whether or not to choose the bicycle5_.

Public authorities commonly use bicycle counters, which enable to automatically, and continuously, register the bicycles that pass some strategically chosen points in the traffic network, in order to collect information about the bicycle flows in an urban area. The data produced by a bicycle counter is a time series, where each of the data points in the series corresponds to the number of registered bicycles during a particular time period, for example, an hour or a day. The number of registered bicycles varies over time based on several factors, including the current weather conditions, time of the day, time of the year, and the current interest of the citizens to use the bicycle as a transport mode. In a recent study, Holmgren at al.6

(see also Aspegren & Dahlström7) show how regression can be used in order to quantify how external factors are expected to influence the bicycle traffic flows at a particular point in a traffic network. In particular, they present a regression model that aims to predict the number of bicycles registered by a bicycle counter, using factors such as day of week, season, and weather (temperature and precipitation) as input variables.

In addition to the factors included in the regression model by Holmgren et al.6_{, there also exist other factors that}

influence how the bicycle flows vary over longer periods time; factors that are difficult to grasp and to explicitly model as input variables in a regression model, for example, since they are not quantifiable using existing data. Examples of such factors are the citizens’ general tendency to use the bicycle and larger infrastructural changes that lead to new patterns of movement. An example of the latter, within the region of our study, is the opening of a new railway station in the center of Malmö in 2010, causing major changes in the traveling pattern for many travelers commuting to and from Malmö. In particular, as the traveling patterns change over longer periods of time, the number of bicycles registered by a bicycle counter is also expected to vary. For example, this implies that the number of bicycles that are registered by the bicycle counter a “normal” day might differ significantly from the number of bicycles registered by the same bicycle counter a normal day a few years later. Using this idea, Holmgren et al.6 indicate that there is potential to improve the regression accuracy by incorporating a long term trend estimate taken over the time series produced by a bicycle counter. In order to implicitly capture those factors that are expected to influence the bicycle flow, but which are difficult (or undesirable) to model as input variables, they suggest using the deviation from a long-term trend estimate at the bicycle counter instead of using the absolute number of bicycles as target variable. For illustration, see Fig. 1 for an example of a long term trend curve estimate for a bicycle counter data time series.

There are different ways to construct trend curve estimates for the data points in a time series. For example, trend curve estimates can be generated by fitting polynomial functions of various degrees to the data points in the time series. Another approach is to use splines. However, as trend curve estimates vary in quality and it is possible to construct a very large number of trend curve estimates, it is important to be able to accurately evaluate and compare the quality of different trend curve estimates.

In the current paper, which is based on the Bachelor’s thesis of Moltubakk & O’Neill8_{, we suggest how to use}

regression modeling in order to evaluate trend curve estimates for bicycle counter data time series. We formulate the main tasks included in our evaluation approach as a stepwise procedure, including regression model formulation, generation of trend curve estimates, and evaluation using cross validation on a set of chosen regression algorithms. In addition, we illustrate our approach for regression-based evaluation by applying it on a time series recorded by a bicycle counter in Malmö, Sweden.

Our work aims to provide input for passenger transport analysis models used by city and transport planners, e.g., for assessing the impact of transport policy measures. The relevance in this direction is emphasized by the fact that bicycling is currently being incorporated in passenger transport analysis models around the world. As mentioned above, we present an evaluation method for assessing the quality of long-term trend curve estimates on a time series

(3)

520 Holmgren et al. / Procedia Computer Science 00 (2018) 000–000Johan Holmgren et al. / Procedia Computer Science 130 (2018) 518–525 3

Fig. 1. Example of a long-term trend estimate for a bicycle counter data time series.

generated by a bicycle counter. This could be used to improve the modeling of bicycle traffic, and in turn increase the knowledge about how travelers choose their mode of transport.

The current paper is organized in the following way. In the next section we give an account to previous research related to our work. In Section 3 we present our approach for evaluating the quality of trend curve estimates, followed in Section 4 by an experimental illustration of our approach. Finally, we conclude the paper in Section 5.

2. Related work

Due to the increasing interest in the use of the bicycle as a sustainable alternative to motorized transport, the research related to bicycling, in particular bicycle data analysis, has been quite intensive during the recent years. For

example, Romanillos et al.9 provide an overview of big data approaches applied in the bicycling context. A large

amount of research concern bicycle sharing systems, where the studied problems include bicycle repositioning10and location of base (or docking) stations11_{. Data mining has been applied in the bicycle sharing context, for example, in}

order to estimate usage patterns12,13_{. Data mining also plays an important role in travel demand estimation (including}

bicycle demand analysis), which is an integral part of traffic and transport analysis models (both in urban and in regional contexts). Traditionally, travel demand is estimated using travel survey data, often combined with GPS trajectories14_{. Bicycle demand can be further estimated using different types of discrete choice models, which have}

been used, for example, for bicycle route and destination choice estimations15_{. In addition, there exists research on}

how various factors, including weather, calendar events, and work related factors, influence the choice whether or not to use the bicycle16,17_{. Finally, Holmgren et al.}6_{contribute a regression model, which is able to model how different}

factors influence the amount of bicycles that are expected to be registered by a bicycle counter.

3. Regression-based evaluation

In this section, we present our approach for evaluation of bicycle counter data trend curve estimates, which we de-scribe as a stepwise procedure that captures the main tasks involved in the evaluation process. The central component is a regression model, which is formulated in the initial step of the procedure, and which is used in the final step in order to estimate how well different trend curve estimates support the prediction of the data points in a bicycle counter data time series. As mentioned above, a bicycle counter produces a time series consisting of a sequence of data points corresponding to a sequence of non-overlapping, non-separated, equally length, time periods, for example hours or days. For each of the time periods, the corresponding data point specifies the number of registered bicycles during that period. For future reference, we refer to the bicycle counter data time series under consideration as a function f(t) defined over an ordered set of time periods T , where f (t) denotes the number of registered bicycles for period

t ∈ T. The generated trend curve estimates are used in the target variable in the regression problem formulation, hence allowing to capture how well different trend curve estimates are able to support the prediction of data points in the time series.

The evaluation procedure consists of the following sequential steps, which will be discussed in more detail below:

Step 1. Formulate regression model.

Step 2. Generate trend curve estimates. Step 3. Select regression algorithms. Step 4. Evaluate trend curve estimates.

It should be emphasized that we did not include any explicit data processing step; instead it is assumed that the data used in the evaluation is processed within the specified steps. However, it should be mentioned that the data processing includes processing of regression model input data (aggregation, normalization, etc.), handling of outliers and missing data values, generation of trend curve estimates and the regression target values used during evaluation, calculation of regression performance values, etc.

3.1. Step 1. Formulate regression model

The purpose of the initial step of the evaluation approach is to formulate the regression problem that will be used later on in order to evaluate the quality of a number of generated trend curve estimates for a bicycle counter data time series. Actually, this step mainly involves selecting which input variables (also called input features), representing factors that potentially influence the amount of bicycle traffic, that should be included in the regression model. It also includes determining the length of the time periods, where each time period corresponds to a data point in the regression problem. It should be mentioned here that it might be desirable to aggregate the data points (i.e., number of registered bicycles) in the time series to form another time series with longer time periods. For example, in the experimental validation presented in Section 4 of the current paper, we used a bicycle counter data time series, where the data points were aggregated from hours to days.

However, it should be noted that selecting an appropriate set of input variables is typically not a trivial task and it normally requires several iterations of refining and evaluating the regression model under development. As we expect that different bicycle counter data time series will be influenced by various factors in different ways, we recommend conducting a variable selection study as part of this step. As a starting point, we suggest using year, day of week, time of year, and weather factors (temperature and precipitation) in the regression model; however, there is a large amount of freedom involved when choosing input variables.

As target variable used in this step, i.e., when selecting an appropriate set of input variables, we suggest using the absolute number of bicycles for each period in the bicycle flow time series. Alternatively, one could use the deviation from the median or from the mean of the time series. Moreover, when experimenting with different sets of input variables, i.e., different regression problem formulations, we suggest testing regression algorithms with different characteristics as different algorithms might perform differently on different data sets.

3.2. Step 2. Generate trend curve estimates

There are many ways to generate trend curve estimates for a time series. Trend curve estimates can be generated, for example, by fitting polynomials of various degrees to the data points in the time series, or by using splines.

As our evaluation approach focus on the evaluation of trend curve estimates, not the generation of estimates, we do not provide any specific guidelines concerning how it is appropriate to estimate trend curve estimates. However, many time series includes seasonal patterns, and we therefore emphasize on the importance to consider this in order to generate trend curve estimates that do not follow the seasonal variations; our idea is that the seasonal variations instead should be captured explicitly by the input variables in the regression model. For later reference we let C denote the set of generated trend curve estimates, each of which can be referred to as a continuous function defined over the considered time period.

(4)