• No results found

Quantifying Traffic Congestion in Nairobi

N/A
N/A
Protected

Academic year: 2021

Share "Quantifying Traffic Congestion in Nairobi"

Copied!
49
0
0

Loading.... (view fulltext now)

Full text

(1)

INOM

EXAMENSARBETE TEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2020,

Quantifying Traffic Congestion in Nairobi

ERIC BOJS

(2)
(3)

Quantifying Traffic Congestion in Nairobi

Eric Bojs

ROYAL

Degree Projects in Applied Mathematics and Industrial Economics (15 hp) Degree Programme in Industrial Engineering and Management (300 hp) KTH Royal Institute of Technology year 2020

Supervisor at KTH: Wojciech Chachólski Examiner at KTH: Sigrid Källblad Nordin

(4)

TRITA-SCI-GRU 2020:117 MAT-K 2020:018

Royal Institute of Technology School of Engineering Sciences KTH SCI

SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

(5)

0

This study has been carried out within the framework of the Minor Field Studies Scholarship Program, MFS, which is funded by the Swedish International Development Cooperation Agency, Sida.

The MFS Scholarship Program offers Swedish university students an opportunity to carry out two months' field work, usually the student's final degree project, in a country in Africa, Asia or Latin America. The results of the work are presented in an MFS rep ort which is also the student's Bachelor or Master of Science Thesis.

Minor Field Studies are primarily conducted within subject areas of importance from a development perspective and in a country where Swedish international cooperation is ongoing.

The main purpose of the MFS Program is to enhance Swedish university students' knowledge and understanding of these countries and their problems and opportunities. MFS should provide the student with initial experience of conditions in such a country. The overall goals are to widen the Swedish human resources cadre for engagement in international development cooperation as well as to promote scientific exchange between universities, research institutes and similar authorities as well as NGOs in developing countries and in Sweden.

The International Relations Office at KTH the Royal Institute of Technology, Stockholm, Sweden, administers the MFS Program within engineering and applied natural sciences.

Katie Zmijewski Program Officer

MFS Program, KTH International Relations Office

(6)
(7)

Abstract

This thesis aims to give insight into a novel approach for quantifying car traffic in developing cities. This is necessary to improve efficiency in resource allocation for improvements in infrastructure. The project took form of a case study of neighborhoods in the city of Nairobi, Kenya.

The approach consists of a method which relies on topics from the field of Topological Data Analysis, together with the use of large data sources from taxi services in the city. With this, both qualitative and quantitative insight can be given about the traffic. The method was proven useful for understanding how traffic spreads, and to differentiate between levels of congestion: quantifying it. However, it failed to detect the effect of previous improvements of infrastructure.

Keywords

Topological Data Analysis, Traffic congestion, Quantification, Big Data, Smart Cities

(8)
(9)

Sammanfattning

Målet med rapporten är att ge insikt i en innovativ ansats för att kvantifiera biltrafik i utvecklingsstäder. Detta kommer som en nödvändighet för att kunna förbättra resursfördelning i utvecklandet av infrastruktur. Projektet utspelade sig som en fallstudie där stadsdelar i Nairobi, Kenya studerades.

Ansatsen innefattar en metod som bygger på tekniker från topologisk dataanalys (eng. Topological Data Analysis), tillsammans med stora datakällor från taxitjänster i staden. Detta hoppas ge både kvalitativ och kvantitativ information om trafiken i staden. Metoden visade sig vara användbar för att förstå hur trafik sprider sig och att differentiera mellan nivåer av trafik, alltså att kvantifiera den.

Tyvärr så misslyckades metoden visa sig användbar för att mäta förbättringar i infrastruktur.

Nyckelord

Topologisk dataanalys, trafik, kvantifiering, big data, smarta städer

(10)
(11)

Acknowledgements

I would like to express deep gratitude to Jared Ongaro at the University of Nairobi for guidance both in the field of topology and on the roads of Nairobi. As well as supervisors Wojciech Chachólski and Julia Liljegren at the Royal Institute of Technology, KTH for support and guidance through the project.

I would like to extend my thanks to Souzan Hammadi and Jonathan Jilg for insights and motivational support to pursue this study.

This thesis was written within the scope of a Minor Field Study and granted financial support by Sida.

Data sources used:

Data retrieved from Uber Movement, (c) 2020 Uber Technologies, Inc., https://movement.uber.com.

Data © OpenStreetMap contributors.

(12)

Author

Eric Bojs <ebojs@kth.se>

CINEK-17, Applied Mathematics KTH Royal Institute of Technology

Examiner

Sigrid Källblad <sigridkn@kth.se>

Department of Mathematical Statistics KTH Royal Institute of Technology

Supervisor

Wojciech Chachólski <wojtek@math.kth.se>

Department of Mathematics

KTH Royal Institute of Technology

Supervisor

Julia Liljegren <julia.liljegren@indek.kth.se>

Department of Industrial Economics and Management KTH Royal Institute of Technology

(13)

Contents

1 Introduction 1

1.1 Background . . . 1

1.2 Purpose . . . 2

1.3 Research Questions . . . 2

1.4 Scope and Limitations . . . 3

1.5 Previous Research . . . 3

2 Theoretical Background 5 2.1 Topological Data Analysis . . . 5

2.2 Statistical Analysis . . . 9

2.3 Data in Context . . . 10

3 Data 12 3.1 Sources . . . 12

3.2 Assessment of Data . . . 14

4 Method 17 4.1 Subjects of Study . . . 17

4.2 Mathematical Analysis . . . 18

5 Results 21 5.1 Research Question 1 . . . 21

5.2 Research Question 2 . . . 23

5.3 Research Question 3 . . . 26

6 Discussion 28 6.1 Discussion of Results . . . 28

6.2 Further Research . . . 29

6.3 Conclusion . . . 30

(14)
(15)

1 Introduction

1.1 Background

As urbanization and population growth extends across the African continent, it is inevitable that new problems and challenges will arise. Many of these emerge from a point of infrastructure and transportation. Especially, in the field of traffic management; with the increasing abundance of motor vehicles, traffic congestions have followed.

Cities in developing countries, many situated in Africa, have a greater challenge to adjust for the demand on their roads than European and American cities. While European cities have seen a gradual increase in car traffic, the number of road users in countries like India, Colombia, and Ghana has been amplified many times over in just a few decades [15]. This has in turn made these countries suffer the absolute most from traffic-related problems.

In Nairobi, a city with merely 4 million inhabitants, traffic congestions are assessed to be one of the worst in Africa [23]. Although such a ranking and comparison of cities is difficult, some might say impossible, what is more obvious is the negative impact. The congestions have an estimated yearly cost of around 9 billion SEK for the city [3], and contribute to a high fatality rate that is around 800 people annually [18]. High congestion also contributes to increased pollution which is both a health and an environmental risk [6].

The congestion in these emerging cities is caused by a multitude of reasons, from drivers not being properly educated in techniques that can lower the overall congestions, to an absence of traffic lights in many road crossings. Attempts to upgrade road infrastructure are being made but have failed to match current demand. There are several ways the city tries to combat its traffic problem, with new highways being built, and other modern traffic solutions that will lower the number of cars on the road.

Since resources used to improve infrastructure are scarce, there is an even greater need for efficiency. A first step to bring efficient thinking into the game is to be able to measure traffic: quantifying it. Once there is a tool to measure traffic there

(16)

is a way to use resources more efficiently. The cost of quantifying traffic has turned into a catch-22 for developing cities: One has limited resources and thus needs to be efficient; finding efficient solutions drains resources. Hence, there is a great need for a low-cost quantification method for traffic within a city.

To build a method for quantification, data is key. For developed cities, data is often collected by the institution which handles traffic itself. Some with the use of cameras, in conjunction with surveys to find the movement patterns. These sources, although good, can be quite costly and thus not feasible.

With the rise of online-based mobility service providers like Uber and Bolt, the cost of data collecting has been set to zero. As these app-driven taxi services have drivers equipped with a smartphone to operate, the companies can easily save historic travel data across the urban network. Also, since there is an economic incentive for these companies to reduce traffic to improve their services, some companies have begun to publish their data [16]. Thus, traffic data is not limited to rich developed cities but can be available to everyone. This data must be utilized to carry value, and so there is a need for useful methods that can give qualitative understanding of an urban network.

1.2 Purpose

This report aims to study an approach to quantify traffic congestions within developing cities. The quantification is done by analyzing large chunks of traffic data with methods from the field of topology, more precisely: the field of persistent homology. If proven successful, the introduced methods can give a quantitative insight into the urban traffic network. With this, stakeholders can measure impact and make better-informed decisions when it comes to improving road infrastructure, which potentially creates economic value for the city.

1.3 Research Questions

This project will develop a method, referred to as the Method, built from tools from the field of Topological Data Analysis. The Research Questions, RQs, are studied relating to said method.

2

(17)

RQ0: What role does data have in context to the Method?

RQ1: Can the Method be applied to traffic networks in urbanized developing cities and give qualitative insight?

RQ2: Can the Method differentiate between categories of traffic congestions?

RQ3: Can the Method be used to detect advancements of road infrastructure?

Although it could be seen as an stylistic mistake, or a silent criticism of programming notations where index arrays start at 1, RQ0 should be seen as the preamble to the Method and is discussed in Chapter3. The following Research Questions, which focus on the Method, are described in4.

1.4 Scope and Limitations

The project takes the form of a case study of Nairobi, the capital of Kenya. The city, as by the largest by population size in Eastern Africa, should be seen as a good representation of either the current or future state of other developing cities in the nearby regions.

The Method will aim to be inexpensive to implement. This is done by relying on publicly available sets of data and does not need for development and installation on new detectors which are often used in traffic control. This objective is crucial for the Method’s development beyond this project and implementation in traffic management in developing cities.

It should be clear for the reader that this project does not aim to create a full- fledged tool that can be used by an institution. However, the Method is developed with an underlying assumption that could, or will, be. Thus, the discussion of the context of data is needed for a management perspective. With that, the core aim of the project is to show that the expansion of topological data analysis to work with traffic networks holds ground and can be useful.

1.5 Previous Research

The field of Topological Data Analysis, TDA has risen in popularity. As data has become abundant, the need for methods to study it has been craved. Methods

(18)

form the field has been applied to a multitude of areas, notably by G. Carlsson and T. Ishkhanov who showed the use of the field within the natural image processing [7].

The number of studies of methods from TDA applied to traffic network is scarce.

However, an initial case study in Downtown New York City [26] shines a light on the linkage of the two topics. The article does not present more than a proof- of-concept for traffic network studies and only studies a small neighborhood in a modern city with rigorous traffic planning and control.

Although the potential has been highlighted, the use of quantitative methods based from emerging technologies in developing countries is limited. What is more interesting is that a great trouble has been identified for the implementation in relation to policy advancements. Governments in developing cities has been identified to lack experience to develop sufficient policies [24].

There is increasing research for data in context to the strategy of institutions and companies. With the rise of modern computational technology both the gathering and the analysis of data has become more available. Data has risen to be a crucial component in effective strategy and the understanding of it is thus important for longevity [10].

4

(19)

2 Theoretical Background

2.1 Topological Data Analysis

Topology is a field of mathematics that chooses a more general view of sets and spaces to operate within. A popular scientific description of the field will say it is the study of shapes and objects while bending and stretching is allowed, but tearing and gluing is not.

Data analysis is a simplifying process. It is about ignoring most of the information available and extracting what is relevant for a given problem. The same strategy of extracting simplified summaries is also at the core of topology.

In recent years, these two branches have merged, giving rise to Topological Data Analysis, TDA. There is growing evidence testifying to the fact that the shape of data does matter. This thesis will illustrate how one can use the shape and geometry of data to extract relevant information.

2.1.1 Data

Data can be seen as the bridge between the real world and the mathematical universe. In this report, we will define data as a finite set X, withholding some information about the real world around us. Elements x ∈ X are applicable to functions that reveal information.

2.1.2 Distance

There are various ways of encoding shape and geometry. The most direct one is via distances. Distance is a function d on data X,

d : X× X → [0, ∞). (1)

Satisfying,∀x, y : x, y ∈ X

1. Symmetry: d(x, y) = d(y, x) 2. Reflexivity: d(x, x) = 0.

The pair (X, d), the set and its distance, is called a distance space [8].

(20)

2.1.3 Simplex

Another way to encode geometry is in the form of simplexes, forming simplicial complexes. A finite non-empty set σ is called a simplex of dimension n =|σ| − 1, or an n-simplex. A 0-simplex is called a vertex, and a 1-simplex is called an edge.

For example, the set{a, b, c} is a 2-simplex [8].

2.1.4 Simplicial Complex

A collection K of finite non-empty sets is a simplicial complex if, for every element σ in K (which is a set itself), every non-empty subset of σ is also an element in K. For example the following collection{{a}, {b}, {c}, {a, b}, {b, c}} is a simplicial complex, however the following collection{{a}, {b}, {c}, {a, b}, {a, b, c}}

is not as it does not contain the subset{b, c} ⊂ {a, b, c}.

For a set X, the collection of all its finite non-empty subsets is a simplicial complex denoted by ∆[X] and called the simplex on the set X.

Elements of a simplicial complex are called its simplices. Since elements of a simplicial complex are sets, it makes sense to talk about their cardinality and dimension as described in the previous section [8].

2.1.5 Vietoris-Rips Complex

The first step in the topological data analysis pipeline is to transform measurements encoded by a distance space into a special information. In this thesis we are going to use so called Vietoris-Rips complex construction (sometimes refereed to simply as Rips complex or Rips construction) for this purpose.

Let (X, d) be a distance space. The Vietoris-Rips complex, denoted by V Rt(X, d), of (X, d) at scale t ∈ [0, ∞) is the collection of these subsets σ ⊂ X such that d(x, y)≤ t for all x, y in σ. Thus vertices in V Rt(X, d)are the elements of X, and edges in V Rt(X, d)are pairs of distinct elements in X whose distance does not exceed t.

6

(21)

V Rt: (X, d)→ K : t ∈ [0, ∞). (2)

The Vietoris Rips construction, applied to a distance space (X, d), results not in just one simplicial complex but a whole family of simplicial complexes V Rt(X, d), indexed by non-negative reals t in [0,∞) [8].

2.1.6 Persistence

The trivial question when constructing a Vietoris-Rips Complex is: What value of tshould be used? To answer this, the introduction of persistence is given. The study of persistence is to view a filtration process as the value of t increases and changes of the simplicial complex. The variable t is parameterized to (t)N1 with slowly increasing in value in N number of steps from t1to tN. I.e 0≤ t1 < ... < tN and ti ∈ [0, ∞). And,

V Rt1(X, d) ,→ V Rt2(X, d) ,→ . . . ,→ V RtN(X, d). (3)

These complexes are not independent. Note that if s < t, then V Rs(X, d) is a subcollection of V Rt(X, d). Thus these complexes form an increasing nested family of simplicial complexes, indexed by non-negative reals, culminating with a simplex ∆[X] [8]

2.1.7 Barcode

A simple way to study persistence is the use of bars. A bar is a pair (b, d) : b, d∈ R which describes the birth b and death d of a feature of the set through a filtration. Although the use and definition of feature can be quite vague, this report will see it as the persistence of simplexes within the Vietoris-Rips Complex. These can be seen as the development of clusters within the data.

A barcode is a visualization of the persistence. A barcode is a bar chart with the collection of bars under a filtration as the scale increases. The x-axis shows the filtration value t’s development, and the y-axis shows the stacks of bars. Bars’

position on the y-axis are sorted by value of birth and secondly by value of death [11].

(22)

A fundamental theorem of topological data analysis states that the process of assigning a distance space (X, d) to a Vieotoris-Rips Complex to study features is a continuous process. That means that small changes in the measurements lead to only small changes of the bar decomposition. Such stability is an absolutely necessary requirement for any methods that we would like to use in analysing data. [8]

2.1.8 Signature

Stability is the key advantage of the barcoding process. Performing statistical analysis on barcodes is however difficult. That is a big disadvantage of barcoding.

For statistical analysis one need to be able to take averages and expected values.

That requires ability of adding and multiplying by real numbers. It is not known how to perform these operations on barcodes. For statistical analysis, there is thus need for an additional step to transform bar decompositions into objects which we can add and divide by numbers. For that purpose, one can use signatures, sometimes also called stable ranks.

A signature is a piece-wise constant function that follows the outline of a barcode by the death of each bar. Meaning,

f : (t)N1 → [0, ∞) (4)

where the value of f is constant [ti, ti+1)and equal to the the rank of the bar on a y-axis. Meaning, that for all L number of bars{(bi, di)}Li=1,

f (t) = max{i : di ≥ t}. (5)

It should be noted that signatures often vary from [0,∞) but in this report a normalized version will be used [0, 1]. Meaning that the signatures are squeezed and stretched to match on the y-axis.

Signatures are an important tool in TDA. It allows digesting a complex network into a function graph. Signatures have been proven to be stable under small variations in the underlying data set and to be able to group data sets with different

8

(23)

features [9].

2.2 Statistical Analysis

2.2.1 Principal Component Analysis

Principal Component Analysis, or PCA, is a method for variable reduction by defining a new Euclidean Space for an, often large, set of variables where new coordinated axis are fixed. The new coordinates are refereed to as Principle Components. The components are ordered and capture the most variance within the data set in decreasing order.

Given a set of normalized variables in matrix form X = [x1, . . . , xn], PCA relies on Eigendecomposition of the matrix. An Eigendecomposition where the eigenvectors and eigenvalues are sorted.

X = QΛQ−1 (6)

Q = [t1, t2, . . . , tn], Λ = [λ1, λ2, . . . , λn] (7)

λ1 ≥ λ2 ≥ . . . ≥ λn (8)

where λ and t are eigenvalues and eigenvectors respectively. The set X can then be reduced to a matrix Xp, of length p:

Xp = XQp (9)

Qp = [t1, t2, . . . tp], 1≤ p < n. (10)

This reduces the original number of variables to a much smaller set, with the least amount of information loss [21].

2.2.2 Welch’s T-Test

A more general form of the Student’s T-Test is Welch’s T-Test. It forms a statistical test under the null-hypothesis that two samples have equal means. As opposed to Student’s T-Test, Welch’s test does not assume equal variance of the sample and

(24)

assumes that the mean of the sample is normally distributed.

Welch’s T-test creates a test statistic t which follows a t-distribution, T . The null- hypothesis can be formulated as:

H0 :Two sets Xi, Xj have equal means (11)

with the p-value:

p = P[t ≤ T ] (12)

[12].

2.3 Data in Context

2.3.1 Big Data

Big Data has risen and has been used to label large sets of data. In the article

”New Games, New Rules” I. Constantiou and J. Kallinikos discuss the use of data in the context of strategy. They go beyond a buzzword definition and describes the main characteristic of Big Data, as separate from other sets, to be generated by a process. A process which is not affected by the scaling, as would often be with manual labor which carries organizational complexity and burden [10].

2.3.2 Smart Citites

The term Smart Cities has been used to describe the shift towards the use of Big Data in the context of city development. N. Alharbi and B. Soh describe the phenomena in their article ”Roles and Challenges of Network Sensors in Smart Cities”, which describes a Smart City as one where institutions and stakeholders for the city use different types of sensors to make more informed decisions [1].

2.3.3 3Vs of Big Data

In industrial management, particularly operations management, one will often come across the framework known as 5 Vs. Each V looks at the operations from a perspective. This has been extended to assessment of data pipeline, sometimes 10

(25)

adding an extra V for Veracity and configuring the others to fit. This report will present three of the Vs as a framework to assess Big Data. These are Value, Veracity and Volume.

Value can be seen has the potential economic benefit one gets from using the data, minus to cost of carrying it. What should be clear is that data which is not used still carry value to the right beholder. Veracity is the truthfulness of data.

It describes both the accuracy and precision of it. Lastly, Volume relating to size of the data set. However, what is important is to see what is not in the data set.

I.e. what has been left out and the systematic relation this this [2].

(26)

3 Data

As previously stated, one of the objectives of this project is that the Method can be implemented in the strategy of a traffic institution. With this, there is a need to be able to think of data in a context that relates to this postulate, i.e. RQ0.

Data and strategy link together. Just as one needs to be critical of strategy, there is also a need for qualitative critical thinking in relation to data. Thus, this chapter will give an introduction to the sources and data sets handled in the project, as well as an assessment of them. This chapter will answer RQ0 and hopes to give insight from the perspective of Industrial Management and Economics.

3.1 Sources

3.1.1 OpenStreetMap

To acquire the map, or put more precisely: The layout of the road network within the studied area, the map service from OpenStreetMap was chosen. This, because of its overlap with Uber Movement and it being open-source.

The map is generated by the website’s community. This means that the mapping is built bottom-up by users contributing changes and development as needed. It has been proven to be a successful business model, known as Crowdsourcing, and is used by a number of companies.

As stated earlier, the company’s map service is open-source. This means that every bit of information is available for download. Although the service carries a wide range of information, this report will focus on the road network provided by the service.

The data creates an XML-like file named .OSM. From this a data set is constructed.

See Table3.1for format and Table3.2for variable information.

3.1.2 Uber Movement

Traffic data was acquired from Uber Movement’s open data sets. The service is owned by Uber Technologies Inc., best known for their product Uber. Uber

12

(27)

Table 3.1: Data structure generated from OpenStreetMap.

Node ID Latitude Longitude 30092201 -1.3212785 36.8366177

81370831 -1.3031852 36.8241605

... ... ...

Table 3.2: Variable definitions for Uber Movement Variable Type Description

Node ID Integer Identifier for intersection

Latitude Float Geographic position Longitude Float Geographic position

Table 3.3: Data structure generated from Uber Movement.

Start Node ID End Node ID Speed

30092201 8366177 34.6

81370831 11620851 24.8

... ... ...

Table 3.4: Variable definitions for OpenStreetMap Variable Type Description

Start Node ID Integer Intersection

identifier for starting position for road section.

End Node ID Integer Intersection

identifier for ending position for road section.

Speed Float The recorded average speed in [km/h] for the road section.

(28)

is a ridesharing platform that connects drivers and riders through their phone application.

The data is aggregated by the many drivers connected to Uber Technologies Inc.’s platform. Although the data is published with a wide range of information, this project will look at three variables describing a road section in the traffic network.

The data describes the state of the traffic network by the hour of each day. Besides daily information, there is also a set with aggregated data of each hour over a quarter. The data structure is shown in Table3.3and a description for variables can be found in Table3.4.

3.2 Assessment of Data

To answer RQ0, asking the role of data in Method, three subquestions have be put forward to critically assess the sources. These are:

Q1: Why are theses sources publicly available?

Q2: Is the data accurate?

Q3: What is not in the data?

Each question has been formulate to relate to one of the three Vs of Big Data:

Value, Veracity and Volume.

3.2.1 Value

There is no doubt that data is valuable. However, with that premise it becomes counter-intuitive for companies to publish data for free. In this section, the economic incentive behind Uber Inc. and OpenStreetMap to publish their data will be discussed. This relates to Q1.

Uber Movement states on their website that they are aware of the value they bring to the field urban mobility planning [17]. What is less clear is the reason for the data being available at no cost. Speculative reasoning brings that this could be a kind of goodwill, where the company’s brand is strengthened. This ties

14

(29)

together to global movement to share data sources, most evident by the European Commission’s Horizon 2020 program with the initiative Open Access [19].

OpenStreetMap holds a more interesting case. The map service owned by the non-profit organization OpenStreetMap Foundation [13] and relies on private donations and contributions. The service is a school-book example of crowdsourcing [5], similar to Wikipedia Foundation Inc.’s work. The value focus of a the website is stated to be as a community project. The reasoning to why people contribute to the project is most often that they themselves are users of the open-source service and realize the need to keep it up-to-date.

3.2.2 Veracity

V. Rublin and T. Lukoianova raise the question of credibility of the creator in their article ”Veracity Roadmap” on the topic of Big Data [22]. Q2 can be seen from the point of economic incentive to falsify or tamper with data. Thus, on the topic of Veracity, the sources will be assessed on economic incentive to select and bend data published.

Uber Movement states that the data is aggregated from Uber Technologies Inc.

However, how the data is filtered and aggregated is not clear. Uber allows for transport by Boda Boda, a type of motorcycle taxi found in Eastern Africa. These vehicles do not conform to traffic congestions. The inclusion of Boda Bodas would affect the veracity of the data in relation to congestion analysis, and thus the veracity. For more information about the way Uber Movement aggregates information, please refer to [16].

As OpenStreetMap has users add contributions to the service, most of which are anonymous, the company uses a wide range of tools for quality assurance [25].

And although mapping errors can be found, these are often for smaller roads and in less populated areas [14][20]. Thus, the veracity of the data can be seen as good for use in this project.

(30)

3.2.3 Volume

Although it might be interesting to look at the size of Big Data, what comes to be more eye-catching is to see what is missing from the data. This relates to a more quantitative perspective on the sources to see what is missing and why that is.

Here, Uber Movement moves into a state of oddity. To measure the speed of a road segment, it relies on that there is a vehicle that is connected. Some would say that taxi drivers have the best knowledge of when and where congestions appear and thus know how to avoid them. So, if there is heavy congestion, Uber Movement might have trouble detecting it. However, if there is no congestion, i.e. no traffic;

there is per definition no vehicle on the road. Thus, we might expect to see missing data points both where there is no traffic, and where there is heavy traffic.

For OpenStreetMap, the relation to Volume follows the discussion as seen in Veracity.

3.2.4 Smart Cities

It is also important to see that RQ0 sets the arranges the Method qualify the city for the label Smart City. This is not obvious as the Method aims to be applied to developing cities, which are often not the ones in focus when discussing the term.

As the sources described above qualifies to be Big Data, it will in turn make the Method, if used, convert the developing city it is used in to be a Smart City. The issue at hand is however how possible the Method is to be implemented as into a traffic institutions tool box. Thus, it should be clear that this report does not aim for this, but to lay the foundation for such a tool.

16

(31)

4 Method

4.1 Subjects of Study

To answer the three Research Questions introduced in Section1.3, the project will use the theory described in Chapter2together with data described in Chapter3.

With the overall goal to quantify traffic congestions in Nairobi, the method relies on empirical data of the traffic network. It is assumed that the traffic network is a stochastic outcome of an unknown distribution but follows certain trends.

The Method which will be presented in the following sections analyzed using the programming language Python together with the libraries Gudhi and Pandas.

Although Python might not be seen as the first choice in the use within the field of mathematics, the prevalent use of it within courses at the Royal Institute of Technology and common use within the field of Topological Data Analysis, proven by the development of Gudhi, makes it a good choice for use in this project.

The code for this project can be found at Github at [4].

4.1.1 Assumptions

For the Method to function, the following assumptions are made.

First, the traffic on all weekdays can follow the same distribution. As people, and thus traffic, follows a week-based schedule with Monday through Friday being working days and weekends being separate. The traffic on weekdays should be close to the same.

Second, that traffic is a stochastic outcome depending on the hour of the day. As people follow a 24-hour time schedule, traffic should depend on the time of day but be the same throughout the week.

Thirdly, that the traffic network within a given month does not vary significantly.

This should hold true for some months with no public holidays nor longer sequence of red-letter days.

Lastly, the change of one or multiple intersections will have an effect on the overall traffic in a neighborhood.

(32)

4.1.2 Relation to RQs

These assumptions can then be used to answer the RQs.

RQ1: Can the method be applied to traffic networks in urbanized developing cities and give qualitative insight?

For this, the traffic network will be described as a distance space to form a simplicial complex. The study of persistence will be used to Barcodes to show that topological data analysis carries qualitative value to understand the otherwise complex traffic network. This is done by studying the birth and death of simplexes of size 1 and more within the Vietoris-Rips Complex. i.e. a bar is only born when two intersections form an edge, and will die when it connects to larger simplexes.

RQ2: Can the method differentiate between categories of traffic congestions?

With the assumption that the traffic network is a stochastic outcome depending on the hour of day, given that it is a weekday, signatures will study simplexes of all sizes within the Vietoris-Rips Complex. Meaning, each intersection is seen as a feature from the beginning and dies as it connects to larger simplexes. These signatures should be stable under the same distance space and can thus be viewed as samples from a state of the traffic network.

RQ3: Can these methods be used to detect advancements of road infrastructure?

To test for this, the method will be used to detect if it can detect the installation of a traffic light in an neighborhood of the city.

4.2 Mathematical Analysis

4.2.1 Quantification Postulates

There will be two main assumptions about the traffic that will make it applicable to methods within the field of Topological Data Analysis. Both are needed to close the gap between traffic as a directed network to be a distance space.

Major roads have lanes going in different directions separated from one another.

18

(33)

This is exemplified in highway construction where roads build with a median strip. Although the common name for these roads is the same, OpenStreetMap will identify these as separate. This will make many roads within the city one- way which will make the network follow the characteristics of a directed graph. A filtration of a distance space will however not take this into account. It should thus be clear that there is an underlying assumption that on a traffic affects intersections on both ends.

Minor roads that are trafficked in both directions have separate readings on the speed. However, as the second prerequisite for a distance is reflexivity, the speed for these roads were set to the smallest of the two. This reconstruction can be supported by two arguments. First, as the traffic in one direction affects the speed of traffic in the other. This holds especially true for a city like Nairobi where drivers often will drive into the lane of oncoming traffic. Second, as flows are often in only one direction, the other lane is often in a state of near free-flow which often makes it difficult to get a measurement of the speed as fewer cars are on it. Remember, as data is collected by cars, we must have cars on the road to get a reading.

Distance Space

Each node in the traffic network will be seen as an element in data, X as it is a finite set. The speed between nodes can be seen as a distance, thus creating a distance space (X, d).

Vietoris-Rips Complex & Filtration

The distance space can then be used to create a filtration of Vietoris-Rips Complexes where the scaling goes from 0 (no speed) to 50 (free-flow in urban areas).

Persistence

As mentioned in section 4.1.2, there will be two types of features studied, first one will look at the persistence of simplexes of size 1 and more. This will give an insight into how roads link together with congestions. And the second will study simplexes of all sizes which can form a signature that gives quantitative information of the state of the network.

(34)

4.2.2 Statistical Verification Principle Component Analysis

To test if signatures statistically different from each other Principal Component Analysis will be used. By seeing each signature as a group of 100 variables situated on the y-axis with values corresponding to a position on the x-axis, the transformation to two Principal Components can be done.

Each signature will be transformed into a mapping of one value on each Principal Component.

Welch’s T-Test

To test if the distribution on the Principal Components is significantly different from each other, Welch’s T-test will be used to compare signatures pair-wise. Pair- wise meaning that if multiple signatures are given, each one will be tested to all other one-by-one.

20

(35)

5 Results

As stated in the previous section, to answer the defined RQs, the result section is into three parts corresponding to each one separately.

5.1 Research Question 1

RQ1: Can the Method be applied to traffic networks in urbanized developing cities and give qualitative insight?

The area studied is Downtown Nairobi. Looking at figure 5.1, figure 5.2 and figure5.3on page 22, one can see the clusters of traffic (i.e. congestion) slowly appearing as the filtration scale t increases.

Each cluster has a corresponding bar in in the barcode.

To read the figures, it is necessary to understand that the map and the barcode go hand-in-hand. Looking at the map, one can see clusters of roads slowly appearing, in the beginning, they are often not connected to any other. However, as the scale tincreases, single roads turn into bigger clusters. Each cluster is represented by a bar and has a birth and death.

The figures are based on aggregated data from Q1 2019 at 10:00 to 11:00.

(36)

Downtown Nairobi Q1 2019

10 11 12 13

Speed[km/h]

(a) Map

10 20 30

Speed [km/h]

0 50 100 150

BarID

(b) Barcode Figure 5.1: Map and Barcode for t ∈ [0, 13]

10 12 14 16 18 20

Speed[km/h]

(a) Map

10 20 30

Speed [km/h]

0 50 100 150

BarID

(b) Barcode Figure 5.2: Map and Barcode for t∈ [0, 25]

10 15 20 25 30 35

Speed[km/h]

(a) Map

10 20 30

Speed [km/h]

0 50 100 150

BarID

(b) Barcode Figure 5.3: Map and Barcode for t∈ [0, 35]

22

(37)

5.2 Research Question 2

RQ2: Can the Method differentiate between categories of traffic congestions?

To answer RQ2, persistent signatures for Downtown Nairobi will be studied. By sampling the traffic for weekdays of the month of March, the data is grouped by hour of the day which should by assumption be for a different traffic level.

5.2.1 Persistent Signatures

Although signatures for all 24 hours of the day were found, for ease of reading only four are shown here. They can be seen in figure5.4. For each signature, the mean is shown by a hard-drawn line. The shaded area around it accounts for 95% of the variance in the sampled signatures.

What can be seen is that signatures seem to be different from one another and as stable with the mean and variance in one.

0 10 20 30 40 50 60 70

Speed [km/h]

0.0 0.2 0.4 0.6 0.8

1.0 Hours

2:00 6:00 10:00 16:00

Figure 5.4: Signatures for four different hours of the day for weekdays in March 2019

(38)

5.2.2 Principal Component Analysis

To show that the signatures are statistically different from one another, the first and second principal component were found. Every day in the month of March produces 24 signatures, one for each hour, this gives rise to a total of 504 signatures. Each signature can be mapped to two values, one for the first Principal Component, and one for the second. In figure5.5, signatures from four different hours can be studied.

By the histograms, it is clear that the distribution on each axis is collected, as expected.

−15 −10 −5 0 5 10 15 20 25

Principal Component 1

−2

−1 0 1 2 3

PrincipalComponent2

0 2 4

6 Hours:

2:00 6:00 10:00 16:00

0.0 2.5 5.0

Figure 5.5: Principal Component Analysis of signatures generated from weekdays in March 2019.

24

(39)

5.2.3 Welch’s T-Test

Welch’s T-Test is used to show that the levels of traffic, i.e. every hour, produce signature diagrams that are statistically different from one another. In the chart in figure5.6, each set of signatures are tested pair-wise against one another to show that they are sampled from different sets. The figure shows p-values of the likely- hood that the sets are drawn from the same distribution. Note that since almost all the signatures were statistically different from one another, the figure has a color scheme that accentuates the four times signatures looks to be too similar.

As there is only a small area, around 10:00 to 16:00, which could not be statistically significant, one needs to turn to Principal Component 2 to see if the variance between the signatures are noted on in that dimension. This comparison of Principal Component 1 and 2 are shown in figure5.7. However, there seems to be no obvious improvement for turning to the other Principal Component.

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 0.1

0:00 2:00

4:00 6:00

8:00 10:00

12:00 14:00

16:00 18:00

20:00

Figure 5.6: P-values from Welch’s T-test for Principle Component 1

(40)

0 0.02 0.04 0.06 0.08

12:00 14:00 16:00 18:00 0.1

10:00 12:00

14:00 16:00

(a) PC1

0 0.02 0.04 0.06 0.08

12:00 14:00 16:00 18:00 0.1

10:00 12:00

14:00 16:00

(b) PC2 Figure 5.7: Comparison of section for PC1 and PC2

5.3 Research Question 3

RQ3: Can the Method be used to detect advancements of road infrastructure?

In January 2019, a traffic light at the intersection Juja Road to Muratina was installed. To measure the effect of the traffic light. The method will compare the quarters of 2018 to quarters of 2019.

The following signature two signatures diagrams were obtained for sampling traffic at 10:00 in the morning for the eight quarters in the two years.

What can be seen is that the traffic in 2019 was generally slower than in 2018, which goes against the expected notion that a traffic light should speed up traffic by better controlling the flow.

Welch’s T-test was unable to prove that the two signatures diagram were statistically significant, p-value: p < .05, from each other after performing the PCA. This likely has to do with the small sampling size of the two years, as there are only four quarters in a year.

26

(41)

10 20 30 40 50 60 70 Speed [km/h]

0.0 0.2 0.4 0.6 0.8

1.0 Years

2018 2019

Figure 5.8: Signatures of Juja Road/Muratina junction, 2018 & 2019 at 10:00

(42)

6 Discussion

6.1 Discussion of Results

6.1.1 Research Question 0

RQ0: What role does data have in context to the Method?

It is clear that the data sources used are far from perfect. However, due to the low cost and the ease of use it becomes evident that it does carry substantial value. A method derived from such data sources and used for planning of infrastructure is needed for cities to make the next leap into the realm of Smart Cities. Moreover, developing cities looks to be no worse off for this advancement on a technology level. However, as pointed out by previous research in the field, developing cities might have a greater challenge on a policy level ahead of them.

6.1.2 Research Question 1

RQ1: Can the Method be applied to traffic networks in urbanized developing cities and give qualitative insight?

The barcode and corresponding map gives a qualitative insight how the traffic spreads in the city. For the neighborhood shown, Downtown Nairobi there is a central cluster that spreads Northeast until finally capturing the city.

The barcode has multiple longer bars. This gives evidence of some noise in the data in the form of unconnected roads to the main cluster.

What is seems to be needed is a way to filter out clusters which are never conntected. Furthermore, there seems to be an issue with lack of data on small roads, less than a few meters, which connect larger roads to eachother. This sometimes hinders clusters from connecting when it seems trivial that there has to be congestions between them but that the data is lacking.

6.1.3 Research Question 2

RQ2: Can the Method differentiate between categories of traffic congestions?

It is clear that we can differentiate between some levels of congestions in the

28

(43)

network. However, for hours in the afternoon this proves more difficult. This could either be that the method is not good enough to detect these variances, or that there is no clear difference in the traffic in the afternoon.

There is no simple way of knowing the right answer to this. As a driver one probably has an subjective, and emotional, opinion about how bad the traffic situation can be and differs in the afternoon. However, this does not mean that it should be seen as that different from a traffic management point of view.

6.1.4 Research Question 3

RQ3: Can the Method be used to detect advancements of road infrastructure?

Since the method works on empirical data in the context of a complex system, this was given to be difficult to prove. The outcome is that the neighborhood tested, with the installation of a traffic light, seems to have more traffic after the installation. This, as the signatures lie further to the left, insinuates that roads in the network have a general lower speed.

Sadly, due to the scope of this project there was some difficulties getting a hold of accurate and trustworthy data on the changes which has been made to the city.

Furthermore, the choice of looking at quarterly aggregated data deemed poor from a statistical point of view, as less data increases the level of uncertainty.

6.2 Further Research

This report has given hope to the connection between Topological Data Analysis and traffic networks. The more impressive parts to take into consideration are the use of cluster analysis with the map in connection to a barcode and the use of signatures to describe the state of a network.

From this, it would be interesting to see if the method holds for other cities, both in developed and developing countries.

Furthermore, it would be interesting to see how signatures of different cities compare and if there is a way to compare traffic in cities with the use of these.

(44)

However, a bottleneck for projects similar to this for the moment is data. Both in the form of lack of it in, as well as the lack of transparency of the collection.

6.3 Conclusion

The potential of big data in city management for developing cities should be clear.

Smart City is a label that does not exclude developing countries, which should be seen as the latent truth underlying this report.

Furthermore, the use of Topological Data Analysis within the field of traffic management seems plausible. The method shows ambivalent results and points towards that there is a greater story concealed which is not captured.

What is yet to be seen is the implementation of a tool which uses the core Method described in this report. The realization of such a tool is at a standstill until the relay baton is picked up and carried on.

30

(45)

References

[1] Alharbi, N and Soh, B. “Roles and Challenges of Network Sensors in Smart Cities”. In: IOP Conference Series: Earth and Environmental Science.

Vol. 322. 1. IOP Publishing. 2019, p. 012002.

[2] Angelis, Jannis. ME1308 Operations Strategy for I. Nov. 2018.

[3] Atieno, Barbara. Traffic Jam: Kenya Loses 18.25 Billion Annually | Science Africa. 2020. URL:https://scienceafrica.co.ke/traffic-jam-kenya- loses-18-25-billion-annually/#.

[4] Bojs, Eric. URL:https://github.com/EricBojs/Barcoding-Nairobi/.

[5] Brabham, Daren C. Crowdsourcing. Mit Press, 2013.

[6] Brunekreef, Bert et al. “Effects of long-term exposure to traffic-related air pollution on respiratory and cardiovascular mortality in the Netherlands:

the NLCS-AIR study.” In: Research report (Health Effects Institute) 139 (2009), pp. 5–71.

[7] Carlsson, Gunnar and Ishkhanov, Tigran. “A Topological Analysis of the Space of Natural Images”. In: 2007.

[8] Chacholski, Wojciech. SF2956 Topological Data Analysis. Sept. 2019.

[9] Chachólski, Wojciech and Riihimäki, Henri. “Metrics and Stabilization in One Parameter Persistence”. In: SIAM Journal on Applied Algebra and Geometry 4.1 (2020), pp. 69–98. DOI: 10 . 1137 / 19M1243932. eprint:

https://doi.org/10.1137/19M1243932. URL: https://doi.org/10.

1137/19M1243932.

[10] Constantiou, Ioanna D and Kallinikos, Jannis. “New games, new rules: big data and the changing context of strategy”. In: Journal of Information Technology 30.1 (2015), pp. 44–57.

[11] Ghrist, Robert. “Barcodes: the persistent topology of data”. In: Bulletin of the American Mathematical Society 45.1 (2008), pp. 61–75.

[12] Lu, Zhenqiu and Yuan, Ke-Hai. “Welch’s t test”. In: Jan. 2010, pp. 1620–

1623. DOI:10.13140/RG.2.1.3057.9607.

[13] Main Page. URL:https://wiki.osmfoundation.org/wiki/Main_Page.

(46)

[14] Maps, Missing. MissingMaps. URL: https : / / www . missingmaps . org / osmstats/.

[15] McCarthy, Niall. The World’s Worst Cities For Traffic Congestion [Infographic]. 2019. URL: https : / / www . forbes . com / sites / niallmccarthy/2019/06/05/the-worlds-worst-cities-for-traffic- congestion-infographic/#75a9b85d12bc.

[16] Movement, Uber. Uber Movement: Speeds Calculation Methodology. May 2019.

[17] Movement, Uber. Why are we doing this? Apr. 2020. URL: https : / / movement.uber.com/faqs?lang=en-US.

[18] ODI. Road safety in Nairobi: at the crossroads. 2020. URL:https://www.

odi.org/features/securing-safe-roads/road-safety-nairobi.

[19] Open Science (Open Access). Feb. 2020. URL: https://ec.europa.eu/

programmes / horizon2020 / en / h2020 - section / open - science - open - access.

[20] OpenStreetMap Contribution Analysis. URL: https://mapbox.github.

io/osm-analysis-collab/osm-quality.

[21] Pavlenko, Tatjana. SF29130 Regression Analysis. Feb. 2020.

[22] Rubin, Victoria and Lukoianova, Tatiana. “Veracity roadmap: Is big data objective, truthful and credible?” In: Advances in Classification Research Online 24.1 (2013), p. 4.

[23] Sunday, Frankline. Why end of Nairobi traffic chaos is nowhere in sight : The Standard. 2018. URL:https://www.standardmedia.co.ke/article/

2001305571 / why - end - of - nairobi - traffic - chaos - is - nowhere - in - sight.

[24] Vu, Khuong and Hartley, Kris. “Promoting smart cities in developing countries: Policy insights from Vietnam”. In: Telecommunications Policy 42.10 (2018), pp. 845–859.

32

(47)

[25] Wiki, OpenStreetMap. Quality assurance

— OpenStreetMap Wiki. [Online; accessed 2-April-2020]. 2020. URL:

https : / / wiki . openstreetmap . org / w / index . php ? title = Quality _ assurance&oldid=1952028.

[26] Wu, Y. et al. “Congestion barcodes: Exploring the topology of urban congestion using persistent homology”. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC). 2017, pp. 1–6.

(48)
(49)

TRITA 2020;018

References

Related documents

The teachers at School 1 as well as School 2 all share the opinion that the advantages with the teacher choosing the literature is that they can see to that the students get books

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

compositional structure, dramaturgy, ethics, hierarchy in collective creation, immanent collective creation, instant collective composition, multiplicity, music theater,

In this thesis we investigated the Internet and social media usage for the truck drivers and owners in Bulgaria, Romania, Turkey and Ukraine, with a special focus on

In order to understand what the role of aesthetics in the road environment and especially along approach roads is, a literature study was conducted. Th e literature study yielded

This project explores game development using procedural flocking behaviour through the creation of a sheep herding game based on existing theory on flocking behaviour algorithms,

I have chosen to quote Marshall and Rossman (2011, p.69) when describing the purpose of this thesis, which is “to explain the patterns related to the phenomenon in question” and “to

People who make their own clothes make a statement – “I go my own way.“ This can be grounded in political views, a lack of economical funds or simply for loving the craft.Because