Learning wireless channel models to design real-time communications from vehicles

(1)

Master Thesis

HALMSTAD

UNIVERSITY

Master's Programme in Embedded and Intelligent Systems, 120 credits

Learning wireless channel models to design real-time communications from vehicles

Embedded systems, 30 credits

Halmstad 2018-06-01

Wouter Dankers

(2)

Wouter Dankers: Learning wireless channel models to design real-time communications from vehicles, , c June 2018

s u p e r v i s o r s:

Alexey Vinel and Maben Rabi l o c at i o n:

Halmstad, Sweden t i m e f r a m e: June 2018

(3)

(4)

A B S T R A C T

The aim of this project is to analyze a data log of wireless packet traffic, and to produce: (1) models for dynamic fluctuations in wireless channel and link quality, and, (2) a design for real-time communications over the given wireless channel. The models and designs are useful in setting up real-time communications in a vehicular test track (www.astazero.com). The data log came from experimental measurements at the test track. From this data, we fitted simple models for packet losses and retransmissions in the wireless communication system. These models took the form of a combination of a statistical model for the packet losses, with a deterministic model for retransmissions of lost packets. Such fitted models can be used to predict the average quality of vehicle position monitoring based on periodically transmitted position information. Such predictions help us decide the feasibility of safe and reliable conduct of testing with two or more moving test objects.

iii

(5)

(6)

A C K N O W L E D G E M E N T S

I would first like to thank my supervisors Alexey Vinel and Maben Rabi of the school of Information Technology at Halmstad university.

The door to their office was always open whenever I ran into a trou- ble spot or had a question about my work. They consistently allowed this thesis to be my own work, but steered me in the right direction whenever they thought I needed it.

I would also like to thank the experts who were involved in the As- taMoCa research project. Without their participation and input, this thesis project could not have been successfully conducted.

Finally, I must express my very profound gratitude to my parents for providing me with unfailing support and continuous encouragement throughout my years of study. This accomplishment would not have been possible without them. Thank you.

v

(7)

(8)

C O N T E N T S List of Figures viii List of Tables ix

1 i n t r o d u c t i o n 1 2 b a c k g r o u n d 3

2.1 V2X communication 3 2.2 LTE architecture 4 2.3 LTE retransmissions 6 2.4 LTE stack 7

2.5 Measurement scenarios 8 3 l i t e r at u r e r e v i e w 11 4 m e t h o d 13

4.1 Phase 1 13

4.1.1 LTE transmission latency 13

4.1.2 Cumulative distribution function for IID losses 16 4.1.3 Deficiencies of the IDD loss model 18

4.2 Phase 2 19

4.2.1 Gilbert-Elliott model 19 4.2.2 Measurement campaign 25 4.2.3 Data post-processing 27 4.2.4 Data matching 36 4.2.5 Model creation 40 4.3 Phase 3 44

5 c o n c l u s i o n a n d r e c o m m e n d at i o n s 47 b i b l i o g r a p h y 49

Bibliography 49

vii

(9)

Figure 1 V2X communication 4 Figure 2 Simplified LTE architecture 5 Figure 3 HARQ retransmissions 7

Figure 4 LTE stack 7

Figure 5 LTE transmission diagram 14

Figure 6 CDF 18

Figure 7 CDF zoom in 19

Figure 8 CDF model compared with the ground truth 19 Figure 9 Gilbert-Elliott model 20

Figure 10 Test setup of AstaZero’s researchers day 28 Figure 11 Speed and transmission time over time 33 Figure 12 Histogram of transmission times from each test

object 36

Figure 13 Histogram of transmission times from the test objects combined 37

Figure 14 Speed and transmission time over time 38 Figure 15 Combined CDF compared with the earlier mea-

surements 38 Figure 16 CDF offset plots 39 Figure 17 Retransmission spread 42

viii

(10)

L I S T O F TA B L E S

Table 1 ITS applications and use cases for active road safety 9

Table 2 Total uplink time latency 15

Table 3 Probability mass values for P = 0.2 17 Table 4 Gilbert-Elliott model parameters 21

Table 5 Gilbert-Elliott model example parameters 23 Table 6 Gilbert-Elliott model result 23

Table 7 Pattern learning example result 25

Table 8 Summary of the AstaZero test log data 35 Table 9 Results of Gilbert’s approximation 43 Table 10 Baum-Welch parameter learning 45

ix

(11)

Listing 1 Python code for parsing the log file 31 Listing 2 Python code for calculating speed from GPS

coordinates 34

Listing 3 Python code finding the bin number 41 Listing 4 Matlab script for hmm parameter learning 44

x

(12)

1

I N T R O D U C T I O N

This thesis project is centralized around the AstaZero test track (www.

astazero.com). AstaZero is the world’s first full-scale test environment which makes it possible for vehicle manufacturers, suppliers, legisla- tors, and universities to test out their systems. Their focus is on testing advanced safety systems under different traffic situations. The name AstaZero is a combination of the abbreviation Asta (which stands for Active Safety Test Area) and zero, where zero refers to the vision of the Swedish Parliament’s to have zero traffic casualties.

AstaZero is currently experiencing a problem where they would like to have full network coverage with zero to none package losses at an acceptable throughput regarding safety critical data (E.g. CAM messages). The aim for this thesis is to solve this problem by analyzing the data corpus which may come from experimental measurements at the test track, or from simulations. From this data, we shall try to fit simple models for packet losses and retransmissions in the wireless communication system. These models may take the form of a combination of a statistical model for the packet losses, with a deterministic model for retransmissions of lost packets. Such fitted models will be used to predict the average quality of vehicle position monitoring based on periodically transmitted position information. Such predictions help us decide the feasibility of safe and reliable conduct of testing with two or more moving test objects.

The rest of this document is structured as follows:

• Chapter2: Provides some relevant information needed to fully understand the later parts of the thesis.

• Chapter 3: Presents previous work done in the context of the thesis.

• Chapter 4: Provides an inside in what has been produced and all the steps required to reach those results.

1

(13)

(14)

2

B A C K G R O U N D

This chapter presents some background for the research presented in this thesis. One can find a review on V2X communication in section 2.1. Section 2.2 provides an introduction into the complicated LTE architecture and most of its components. The core concepts about retransmissions schemes in LTE is explained in Section 2.3 as well as the protocol stack (Section 2.4). Section2.5presents some possible measurement scenarios and their applications. Background for the Gilbert-Eliot model in the thesis is not covered in this chapter, but rather presented in Chapter4.

2.1 v 2 x c o m m u n i c at i o n

V2X (vehicle-to-everything) communication is based on WLAN technology and works directly between vehicles, which form a vehicular ad-hoc network as two V2X senders come within each other’s range.

Hence it does not require any infrastructure for vehicles to communicate, which is key to assure safety in remote or little developed areas. WLAN is particularly well-suited for V2X communication, due to its low latency. It transmits messages known as Common Awareness Messages (CAM) and Decentralised Notification Messages (DENM) or Basic Safety Message (BSM). There are four main different types of vehicle communication each with its own features and application area’s combined under the V2X group. These four categories are visualized in Figure 1.

• V2V (Vehicle-to-Vehicle) communication: is also known as VANETs (vehicular ad hoc networks) where two vehicles communicate with each other directly to provide information such as safety warnings, traffic information, etc.

• V2P (Vehicle-to-Pedestrian) communication: here the vehicle knows position information about the surrounding pedestrians. This type of communication could be of big interest in city area’s where the vehicle can take notice of the unpredictable human beings who are in close vicinity of the vehicle.

• V2I (Vehicle-to-Infrastructure) communication: is a broad term where infrastructure can include tollgates, traffic lights, highway entrance ramps, etc. As an example take a vehicle which is driving towards a road junction; the roadside unit (traffic lights) can for instance update the car with the time left for a green/red

3

(15)

light. The vehicle can then update its speed depending on its planned trajectory so that it doesn’t need to stop and waste pre- cious full.

• V2N (Vehicle-to-Network) communication: this type of communication is used when a vehicle is in direct communication with a server connected to the internet. V2N is used for file down- loads, internet access, etc.

Figure 1: V2X communication

The last category of vehicle communication (V2N) is the type of communication this thesis project is focusing on. All tests at the AstaZero track require the vehicle to be in direct communication with a server located in the core network. Direct communication is needed to up- load a given test to the vehicle, update the trajectory of the vehicle, monitor the vehicle in real-time, etc.

2.2 lt e a r c h i t e c t u r e

Long-Term-Evolution (LTE) is a high speed wireless data communication standard for mobile devices base on top of the GSM/EDGE and UMTS/HSPA technology. The standard has been developed by 3GPP (3rd Generation Partnership Project) and specified in Release 8 document series (with minor adjustments in Release 9). LTE is commonly known as 4G due to marketing pressure even though that LTE does not meet the technical 4G criteria. Later on with the introduction of LTE-A (LTE-Advanced) and the improvements of WiMAX- Advanced can it finally be called 4G. In order to differentiate the two standards ITU-R (International Telecommunication Union Radio communication sector) has defined LTE-A as "True 4G". LTE specifi- cations provides:

• Downlink peak rates of 300 Mbit/sec

(16)

2.2 lte architecture 5

• Uplink peak rates of 75 Mbit/sec

• QoS (Quality of Services) to unsure transfer latency’s less then 5ms in the radio access network

• The ability to mangage fast moving mobile notes

• Support multicast and broadcast streams

• Scalable carrier bandwidths from 1.4 MHz to 20 MHz

• Supports both frequency division duplexing (FDD) and time- division duplexing (TDD)

• Seamless handovers to cell towers with older network technology

LTE implements a simple architecture which results in lower operating costs (for example, each E-UTRA cell will support up to four times the data and voice capacity supported by HSPA). Figure2visu- alizes this simplistic architecture which consists of an UE (User equipment), eNodeB (base station), and the EPC (Evolved Packet Core).

The mobile node (UE) communicates with the base station via the ether. For the sake of the thesis project has this wireless node been visualized with a vehicle. Each base station covers a certain area and is in direct contact with the core network as well as with other base station. E-UTRAN (Evolved Universal Terrestrial Radio Access) is the air interface of LTE which include the mobile nodes and all the base stations.

Figure 2: Simplified LTE architecture

The LTE core network has a flat all IP architecture called the Sys- tem Architecture Evolution (SAE). The main component of the SAE

(17)

architecture is the Evolved Packet Core (EPC), also known as SAE Core. The EPC will serve as the equivalent of GPRS networks (via the Mobility Management Entity, Serving Gateway and PDN Gateway subcomponents). The subcomponents of the EPC are:

• MME (Mobility Management Entity): is the key control-node responsible for idle mode UE paging and tagging procedure including retransmissions.

• SGW (Serving Gateway): responsible for forwarding and rout- ing of data packets.

• PGW (PDN gateway): provides connectivity from UE to exter- nal networks.

• HSS (Home Subscriber Server): central database which contains user and subscription related information.

• ANDSF (Access Network Discovery and Selection Function):

provides the UE information about connectivity to 3GPP and non-3GPP access networks (such as Wi-Fi).

• ePDG (Evolved Packet Data Gateway): main functionality is to secure data transmissions with a UE connected to the EPC over an entrusted non-3GPP access network.

Because of its many components and complexity will the EPC (Evolved Packet Core) serve as a level of abstraction for the entire SAE architecture. The focus of this thesis project is on the communication between the UE and the eNB. This is because wireless communication is by its nature not error free so nearly all retransmissions will occur in this part of the network.

2.3 lt e r e t r a n s m i s s i o n s

In LTE upon receiving a packet an automatic repeat request (ARQ) bit is send back to the transmitter. In case of a unreadable/corrupt message a negative acknowledgement (NACK) is send back to the transmitter where as an positive acknowledgement (ACK) will be send back when the message is decodable. There are no packet num- bering included in the ARQ message since LTE knows which ARQ belongs to which packet. This is done by using a fixed time between sending of the package and receiving the ARQ which is also called a stop-and-wait procedure. However since the transmitter stops after each transmission, the throughput is also low. LTE therefore applies multiple stop-and-wait processes in parallel such that, while waiting for an ARQ from one process, the transmitter can transmit data to another ARQ process. When the transmitter receives a NACK back, the

(18)

2.4 lte stack 7

erroneously received message can be send again in the next available time slot which is illustrated in Figure3.

Figure 3: HARQ retransmissions

When the receiver receives an erroneously packet this will not be dis- carded. Even though it was not possible to decode the packet, the received signal still contains useful information. LTE uses hybrid ARQ (HARQ) with soft combining to reduce the amount of retransmission by storing the erroneous received packet in a buffer memory where it can later be combined with the retransmission to obtain a single combined packet that is more reliable than its constituents.

2.4 lt e s ta c k

Let’s have a close look at all the layers available in E-UTRAN or LTE Protocol Stack. Figure 4gives a more elaborated diagram of the LTE stack. Note that communication stack between eNB and EPC will not be covert since this is not relevant for the thesis project.

Figure 4: LTE stack

• Physical Layer (PHY) carries all information from the MAC transport channels over the air interface. Takes care of the link adaptation (AMC), power control, cell search (for initial syn- chronization and handover purposes) and other measurements (inside the LTE system and between systems) for the RRC layer.

(19)

• Medium Access Layer (MAC) MAC layer is responsible for mapping between logical channels and transport channels, mul- tiplexing of MAC SDUs from one or different logical channels onto transport blocks (TB), scheduling information reporting, error correction through HARQ, priority handling between UEs by means of dynamic scheduling, Priority handling between logical channels of one UE, and logical Channel prioritization.

• Radio Link Control (RLC) RLC operates in 3 modes of oper- ation: Transparent Mode (TM), Unacknowledged Mode (UM), and Acknowledged Mode (AM). The RLC Layer is responsible for transferring of upper layer PDUs, error correction through ARQ (Only for AM data transfer), concatenation, segmentation and reassembly of RLC SDUs (Only for UM and AM data transfer). RLC is also responsible for re-segmentation of RLC data PDUs (Only for AM data transfer), reordering of RLC data PDUs (Only for UM and AM data transfer), duplicate detection (Only for UM and AM data transfer), RLC SDU discard (Only for UM and AM data transfer), RLC re-establishment, and protocol error detection (Only for AM data transfer).

• Radio Resource Control (RRC) The main services and func- tions of the RRC sublayer include broadcast of System Infor- mation related to the non-access stratum (NAS), broadcast of System Information related to the access stratum (AS), maintenance and release of an RRC connection between the UE and eNB, Security functions including key management, establishment, configuration, maintenance and release of point to point Radio Bearers.

• Packet Data Convergence Control (PDCP) PDCP Layer is re- sponsible for header compression and decompression of IP data, transfer of data (user plane or control plane), maintenance of PDCP Sequence Numbers (SNs), In-sequence delivery of upper layer PDUs at re-establishment of lower layers, duplicate elim- ination of lower layer SDUs at re-establishment of lower layers for radio bearers mapped on RLC AM, ciphering and decipher- ing of user plane data and control plane data, integrity protec- tion and integrity verification of control plane data, timer based discard, duplicate discarding, PDCP is used for SRBs and DRBs mapped on DCCH and DTCH type of logical channels.

2.5 m e a s u r e m e n t s c e na r i o s

The automotive industry together with ETSI ITS standardization orga- nization in Europe and US-DOT in USA, respectively, have specified a basic set of ITS applications that can effectively reduce the number of road accidents and traffic jams; Details can be found in [1] and

(20)

2.5 measurement scenarios 9

[2]. The cooperative road safety and traffic efficiency applications are listed in Table1.

Applications Use case

Road hazards Forward collision warning (FCW)

Emergency electronic brake light warning (EEBW)

Blind spot warning (BSW)

Wrong way driving warning (WWDW) Do not pass warning (DNPW)

Roadwork warning (RWW) Traffic condition warning (TCW) Cooperative-

awareness

Cross traffic violation warning (CTVW) Intersection collision warning (ICW)

Left/right turn assistance at intersection (LTA) Emergency vehicle warning (EVW)

Table 1: ITS applications and use cases for active road safety

Having these safety applications in mind, a number of scenarios were identified where vehicular channel measurements could be performed.

The possible measurement sites include:

• Rural: A rural scenario is characterized as a country road with open surroundings. The most relevant safety applications for this scenario are Forward collision warning (FCW), Emergency electronics brake light warning (EEBL), and Do not pass warning (DNPW).

• Highway: A highway scenario in general may include 2-6 lanes in each direction with variable traffic. The most relevant safety applications for this scenario are Forward collision warning (FCW), Emergency electronics brake light warning (EEBW), Roadwork warning (RWW), and Blind spot warning (BSW).

• Urban: An urban scenario is characterized by streets/roads in densely populated areas with single- to multi-story buildings lined on both sides of the streets. The Most relevant safety applications for this scenario are Forward collision warning (FCW), Emergency electronics brake light warning (EEBW), Roadwork warning (RWW), Wrong way driving warning (WWDW) and Do not pass warning (DNPW).

• Intersection: An intersection scenario is described as when more than one rural, urban, or suburban streets/roads of varying widths intersect at a certain point. The communication signal between cars approaching the intersection is often blocked by

(21)

buildings of certain height situated at the corner of the intersection. The Most relevant safety applications for this scenario are Forward collision warning (FCW), Emergency electronics brake light warning (EEBW), Roadwork warning (RWW), Wrong way driving warning (WWDW) and Do not pass warning (DNPW).

(22)

3

L I T E R AT U R E R E V I E W

As a starting point for the literature hunt, paper [3] was used to get a great introduction into propagation channel models where the authors discuss the fundamental limits of wireless communication. This paper is extremely relevant since the core of the thesis is about increasing the channel capacity of the wireless vehicular network.

Paper [18] provides an explanation for better understanding of the concept of data-age in vehicular networks paper. The authors use a experimental setup to minimize the age of information by changing the CW (contention window). Information age is minimized at an optimal operating point that lies between the extremes of maximum thoughput and minimum delay.

IEEE 802.11p standard [17] [11] used for vehicular ad hoc networks (VANETS) is seen as the standard for intelligent transportation systems (ITS) due to its easy deployment, mature technology (based on the popular WiFi standard) and low cost. However, the drawbacks are that the system suffers from poor scalability, delays and a lack of quality of service (QoS). As a result of the previously mentioned concerns there has been a increasing interest in Long Term Evolution (LTE) [9] as a potential access technology to support vehicular communication due to its high data rates and low latency for mobile users.

E. Dahlman wrote a wonderful book [6] (4G LTE-Advanced Pro and The Road to 5G) where he provides a real insight and understanding into the why and how of the standard and its related technologies.

Paper [19] in combination with Dahlman’s book provides a great introduction to most of the latency parameters in a LTE system

Paper [12] introduces path-loss models by extensive testing in four different environments (rural, urban, suburban and highway) to find parameters in V2V (Vehicle to Vehicle) communication systems. This is particularly important for analysis of interference and scalability in such networks. The authors mentioned that their study confirm previously executed studies. However more test and measurements are required to achieve a general channel parameters model.

Reliable traffic requires accurate models for V2V propagation channel [4][14]. These survey provides an overview of existing V2V channel measurement campaigns and their succeeding channel characteristics (such as delay spreads and Doppler spreads [13]). The authors also describe the most commonly used types of channel characterization (statistical and geometry-based approach) for a V2V wireless communication environment in different scenarios. Paper [5] discusses vehic-

11

(23)

ular propagation channels, including application-specific scenarios, the impact of vehicle types, and antennas. As well as it has a sugges- tion for future research and development. However, these papers have their focus on modeling the physical layer of the network. Where in this project we focus on modeling the data-link layer of the network.

Multiple models for LTE systems exists which serve as a layer of abstraction to understand and mimic the physical layer of the channel.

Paper [15] provides a survey of the most important developments in the area of MIMO channel modeling. However, a great alternative for these complicated models is a Markov chain. The Markov chain get its power from its simplicity where the system can be described by a stochastic sequence of packet arrival/losses. A more specific model would be the Gilbert-Eliot model [8] where the network can either be in a good state or a bad state. The transition point of going from one state to another banks on probabilistic parameters which are specific to the networks health. A bunch of extension to the well known Gilbert-Eliot model exist. One classical approach is to introduce an intermediate state where the network is not in a good state nor a bad state. The open source framework LTE-Sim [10] can be used to test and verify the Gilbert-Eliot model in different environments.

(24)

4

M E T H O D

This chapter is the core of the thesis where all the required steps are explained in order to create a probabilistic model from experimental measurement log files and all it preliminary steps. The first Section (4.1gives a brief overview on retransmission schemes and the process of sending a packet with its complementary delays in a LTE network.

Also this chapter touches upon the creation of the first model and why the results are unrealistically too perfect. Section4.2was the second part of the thesis which focused on what can be done/generated with the data from AstaZero as well as how to setup a measurement campaign. The section concludes with yet another model that does not meet the requirements. The last Section (4.3) presents a working probabilistic network model generated from the work done in Section 4.2by using the Baum-Welsh algorithm to train hidden Markov models.

The creation of the models follows the principle of Occam’s razor meaning when presented with competing hypothetical answers to a problem, one should select the answer that makes the fewest assumptions. In science, Occam’s razor is used as a heuristic guide in the development of theoretical model, since one can always burden failing explanations with ad hoc hypotheses to prevent them from being fal- sified, simpler theories are preferable to more complex ones because they are more testable.

4.1 p h a s e 1

This was the first stage of the thesis, where the theoretical average transmission latency for the LTE network had to be found and used as a reference for the X plane of the CDF (4.1.1). The calculation of the CDF which takes different packet error rates into consideration can be found in Section 4.1.2. One can find the results of this model in Section4.1.3.

4.1.1 LTE transmission latency

The theoretical average latency is mostly determined by going from the idle stage to the connected stage, which will take roughly 10 ms.

This includes the UE establishes a connection with the core network (EPC) via the base station (eNodeB) using a threeway-handshake protocol, which is also called the request to send (RTS) and grant to send

13

(25)

(GTS) stage. In Figure 5 the threeway-handshake is visualized in a sequential diagram (Figure5.

Figure 5: LTE transmission diagram

Besides the RTS and GTS delay there are some other factors who also need to be included in order to find the total round trip delay. The round trip time is the time it takes for a single package to be sent from an edge node (UE) to the server which is connected to the core LTE network (EPC). All the factors included in the round trip time and there associating values are listed in Table2.

Tx processing is used to prepare the message coming from the application layer for delivery to the server. This includes RLC segmentation, MAC logical channel creation, MAC scheduling, message en-

(26)

4.1 phase 1 15

Latency values

RTS / GTS 10 ms T_rts

Tx processing 1 ms T_{T xeu}

Data transmission (UE to eNodeB) 0.67 ms T_Peu Rx processing (eNodeB) 0.5 ms T_Rxnb

ARQ ≈0 ms T_arq

Data transmission (eNodeB to EPC)

1 ms T_Pnb

Rx processing (EPC) 0.5 ms T_Rxgw

Total 13.67 ms T_tt

Table 2: Total uplink time latency

coding, etc. Whearas Rx processing includes message decoding and RLC concatenation.

The propagation delay to transfer a data frame between UE and the eNodeB is set to the maximal delay which is 0.67 ms. This is due to the fact that LTE is designed to handle at most 100km, which corresponds to maximal timing delay of 0,67ms. The base station (eNodeB) is connected to the LTE core network (EPC) via a wire so it is assumed that the latency is dominated by the propagation delay. According to the propagation speed in copper cables (200,000 km/s) the distance between two nodes (EPC and the eNodeB) of 200km results in a latency of 1 ms. Obviously the network topology choice has therefor a significant impact. In the case of Asta-Zero this propagation delay could possibly be negligible. The propagation delay between the core LTE network (EPC) and the server are assumed to be negligible since those two are most likely not very far from one other. Another factor that could be neglected is the propagation delay of the ARQ message, but the maximum value it could take on is 0.67 ms. Retransmissions in the wired network are neglected since the odds are very low for this to happen.

The total transmission time for one packet starting from idle state corresponds to the following formula.

T_tt= T_rts+ T_{T xeu}+ (T_Peu+ T_Rxnb)∗ (1 + r) + T_Pnb+ T_Rxgw

Which equals after inserting all the constants:

T_tt= 12.5 + 1.17 ∗ (1 + r)

In case of a clean transmission (no packet losses) the total transmission time will equal 13.67 ms. In case of retransmissions the delay will

(27)

increase by a sum of TPnb and TRxgw for each retransmission. How- ever if there is more then one packet in the queue, the retransmission time (TPnb) will increase by the amount of subframes there is in one frame (1 subframe = 1 ms). This is due to the stop-and-wait principle applied in the HARQ. All sub frames need to have received a ARQ before the any new/old subframes can be transmitted/retransmitted.

Hence the next available scheduled retransmission slot will be after all the other subframes have received an ARQ.

T_tt= T_rts+ T_{T xeu}+ (T_Peu+ T_Rxnb+ (SF − 1))∗ (1 + r) + T_Pnb+ T_Rxgw T_tt= 12.5 + (1.17(SF − 1)) ∗ (1 + r)

Where SF equals to the number of subframes used.

4.1.2 Cumulative distribution function for IID losses

The cumulative distribution function (CDF) is used to determine the probability of a value being less or equal to a number. In our case it could be; what is the probability that the transmission delay of a packet is less or equal than 15.6 ms if we know that the packet error rate (P) equals to 0.1. This can be written as

P_r

X6 n

= Xn i=0

P_r X = i

The first step in the process is that we need to construct an probability mass function. In probability theorem and statistics a probability mass function (PMF) is a function that gives the probability that a dis- crete random variable is exactly equal to some value. In this project we assume that losses are IDD. The PMF of retransmissions for n amount of retransmissions and the probabilistic variable p, which equals to the packet error rate could be written as:

P_r(n) =









 P_r

X = 0

= 1 − p P_r

X = 1

= p(1 − p) P_r

X = 2

= p²(1 − p) ...

P_r

X = n

= pⁿ(1 − p)

Let’s consider an example where the packet error rate equals to 0.2 and n equals to 4. Then by inserting p and n into the formula the corresponding probability mass values will equal to the values represented in Table3.

(28)

4.1 phase 1 17

n P_r(n) Value

0 1 − p 0.8

1 p(1 − p) 0.16

2 p²(1 − p) 0.032 3 p³(1 − p) 0.0064

Table 3: Probability mass values for P = 0.2

These values will keep decreasing up to the point where it will be- come closer and closer to zero. If we take n = ∞ and sum up all the values this will equal to 1. In the case of n = 3 the total summation equals to 0.9984 which is fairly close to 1 but not quite there yet. Now to come back to the CDF, what will the CDF be for Pr

X6 n

? This will equal to the summation of all the probability mass values up un- til n. For example the CDF for Pr

X6 3

= 1 − p + p(1 − p) + p²(1 − p) + p³(1 − p)= 0,9984.

In Figure6you can see the CDF plotted out for five different packet error values (p). From this graph it is possible to derive what the probability for successfully transmitting a packet will be at a certain time interval. For example; The probability of a packet arriving successfully after 14ms with packet error rate of 0.2 equals to ≈80%. By comparing the different p values it is visible that when p is decreasing the CDF will increase. At 13.67 ms the curve starts to increase very rapidly.

This is due to the fact that the probability of Pr

T_tt< 13, 67ms

= 0. The minimum time a packet needs to be transmitted equals to 13.67 ms so it is not possible to transmit a packet in a smaller time interval.

In practice this could be possible since the value 13.67 ms is based on the assumptions that we are dealing with maximum delays over the entire network.

Figure6 also tells us that probability of a successful packet transmission is very close to 1 for all the packet error rates after ≈18 ms. This means that for the highest p value it will take at most 4 transmissions to successfully decode a received packet. The amount of transmissions (r) can be easily found by inserting the known parameters into the formula described in Section4.1.1since we know that the Tttwill always increases with 1.17 ms for each retransmission.

T_tt= 12.5 + 1.17 ∗ (1 + r) 18 = 12.5 + 1.17 ∗ (1 + r) r = 18 − 12.5

1.17 − 1 r = 3.7 ≈ 4

(29)

Figure 6: CDF

Figure 7 zooms in on the point where most of the curves meet (between 1 and 0.995 on the Y-axes and between 13.5 and 17.5 on the X-axes) in Figure6. Now you can see that their is a very small probabilistic difference between p = 10⁻⁴ and p = 10⁻³ which was not visual in figure 6. Also it looks like the CDF for p = 10⁻⁴ was 1 almost straight from the start; In Figure 7 you can see that this is not true, but it is very close to 1.

4.1.3 Deficiencies of the IDD loss model

As expected the simplistic calculations of the CDF seem to fail when compared with the ground truth. This is visually detectable by comparing Figure8with the early experimental measurements at the test track(Figure 8 b) where they plot out the RTT (Round Trip Time) needed to setup a TCP (Transmission Control Protocol) connection.

The reason why the CDF fails to follow the ground truth curves is since in our calculations we have only taken one parameter (packet error rate) into considerations. In the future more parameters will be included into the calculation to follow the experimental measurements as closely as possible. One parameter could be how the network was behaving a couple of time steps ago. These together with some other parameters will be extracted from the experimental measurements that the project members from Halmstad university are going to conclude at the test track.

(30)

4.2 phase 2 19

Figure 7: CDF zoom in

(a) CDF model (b) Ground truth

Figure 8: CDF model compared with the ground truth

4.2 p h a s e 2

Was the second stage of the thesis which starts off with some the- ory about the Gilbert-Elliott model as well as all the minimum requirements needed to execute a descent measurement campaign (Sec- tion 4.2.1and4.2.2). The second part covers all the steps that where needed to construct a bit-stream of packet losses from a data log file (Section4.2.3and4.2.4). Finally the results of the Gilbert-Elliott model are presented (Section4.2.5.1).

4.2.1 Gilbert-Elliott model

The Gilbert-Eliott model is a classical 2-state Markov which was introduced by Gilbert [8] and Elliott [7] in the early sixties in order to

(31)

model burst-noise in wireless telephone circuits. The Gilbert-Elliott model is especially powerful and widely used for describing error patterns in transition channels and for analyzing the efficiency of coding for error detection and correction. The model described in Figure 9consists of a good (G) and a bad (B) state each with their transition probabilities and each state may generate independent errors. Param- eter description can be found in Table4.

Figure 9: Gilbert-Elliott model

Gilbert suggested to estimate the model parameters from three mea- surable instances of a binary error process {Et}, where Et = F indicates an error:

a = Pr(F); b = Pr(F|F); c = Pr(FFF)

Pr(FSF) + Pr(FFF) (1) By knowing a, b, and c the three model parameters can be computed in the following manner:

1 − r = ac − b²

2ac − b(a + c); h = 1 − b

1 − r; p = ar

1 − h − a (2) When the observations (the trace) of c are too small Gilbert argues that the c measurements can be avoided by choosing h = 0.5 and replacing 1-r from Equation 2 with 1 − r = 2b. The parameters of a even more simplified Gilbert model (simple Gilbert) include:

p = Pr(F|S); r = Pr(S|F) (3)

Another way of estimating the parameters could be by considering the Average Burst Error Length (ABEL) to determine r and the average number of packet drops to determine pE:

r = 1/ABEL; p = p_E∗ r

h − p_E (4)

(32)

4.2 phase 2 21

Parameter Description

1-k Packet error rate of the Good state 1-h Packet error rate of the Bad state

p Transition probability of going from the Good state to the Bad state

1-p Transition probability of staying in the Good state r Transition probability of going from the Bad state to

the Good state

1-r Transition probability of staying in the Bad state Table 4: Gilbert-Elliott model parameters

In order to calculate the probabilities of being in a good state (Pr{G}) or in the bad state (Pr{B}) given the parameters from equation2can be done in two ways. The first approaches is the most straight-forward one which starts with writing down all the transition probabilities of each state by looking at Figure 9. (Note that the first approach assumes that we are in the steady state)











P_r{G} = Pr{B} ∗ r + Pr{G} ∗ (1 − p) P_r{B} = Pr{G} ∗ p + Pr{B} ∗ (1 − r) P_r{G} + Pr{B} = 1

(5)

The next step is to solve the linear equations by substituting the un- knowns. (Note that the process for both states are identical):

P_r{G} = (1 − Pr{G}) ∗ r + Pr{G} ∗ (1 − p) 0 = r − r∗ P_r{G} − p ∗ Pr{G}

P_r{G} = _r+p^r

(6)

After solving the linear equations we end up with the following for- mulas to calculate the state probabilities:

P_r{G} = r

r + p; P_r{B} = p

p + r (7)

Equation7is derived from Equation5by solving the linear equation.

Now one can plug in all the values into Equation 7 to find out the state probabilities.

If the Markov chain is unknown, approach two is the best way to go. Approach two can also be in any arbitrary stage since the state probabilities converge overtime and is written in matrix form which

(33)

makes it easier for a computer to solve. Equation8gives the transition probability matrix for the Gilbert-Elliott model. In the beginning any arbitrary number can be chosen for Pr{G} and Pr{B} as long as it follows the formula: 1 = Pr{G} + Pr{B}:

"

P_r{G}

P_r{B}

#

=

"

1 − p r p 1 − r

#

×

"

P_r{G}

P_r{B}

#

(8)

The matrix can easily be solved by doing matrix multiplications for both Pr{G} and Pr{B}. In Equation9 ˆP and 1 − ˆP equal to Pr{G} and P_r{B} respectively.

"

ˆP 1 − ˆP

#

=

"

1 − p∗ ˆP + r ∗ 1 − ˆP p∗ ˆP + 1 − r ∗ 1 − ˆP

#

(9)

From Equation 9 we can derive the probabilities of each state by inserting the known variables into Equation10.

ˆP = 1 − p ∗ ˆP + r ∗ 1 − ˆP; 1 − ˆP = p ∗ ˆP + 1 − r ∗ 1 − ˆP (10)

Let’s consider an example where we have a input data stream consisting of 500 binary samples. Note that a S to the power of a X means X adjoining success full packet delivery. for example S⁴= SSSS.

Whereas F means a retransmission.

S⁶²FFS⁷⁷FS⁴⁶FFSFSFFFS¹¹FFS¹⁵FS⁴²FS²⁸FFS^9SFS³⁷FFS⁵ FFSFS³⁵FSFFSFS²³FFS⁴FFS¹⁸FS¹⁵FFSFFFSFFSFFFS⁵

With the help of Equation2 we can extract the Gilbert-Elliott model parameters from any given bit stream. From equation1we know that we need to calculate the occurrence of a certain pattern in order to feed this into Equation 2. The pattern occurrence is counted in the following matter and the results are given in Table 5.

• a: Is the amount of 1 bits from the entire data stream divided by the amount of sample.

• b: Is the amount of pairs of 1 bits (two consecutive 1 bits behind each other) and is calculated by dividing the pairs of 1’s by the amount of 1’s.

• c: Is the amount of triple consecutive 1’s given that the middle bit is either a 1 or a 0. This is calculated by dividing the amount of triple 1 bits by a sum of the amount of triple 1 bits and the amount of times the pattern 101 occurs.

(34)

4.2 phase 2 23

variable pattern occurrence

a P_r(F) ₅₀₀³⁸

b P_r(F|F) ¹⁵₃₈

c Pr(FSF)+Pr(FFF)^Pr(FFF) 3 10

Table 5: Gilbert-Elliott model example parameters

Now that the parameters (a, b, and c) are calculated, the next step is to insert these parameters into Equation2:

1 − r = _2ac−b(a+c)^ac−b²

= ^−0.13_−0.10

= 1.29 r = 1 − 1.29

r = −0.29

(11)

As mentioned before, when the c trail is to small (to little values) it will demolish your calculation. In Equation11it turns out we ended up with a negative probability for r which is impossible. In order to solve this the parameter c is trashed and 1-r = 2b is used instead of

ac−b²

2ac−b(a+c) and h is set to 0.5. The state probabilities are calculated by using Equation7. Results are shown in table6.

variable result

r 0.22

1-r 0.78

p 0.03

1-p 0.97

P_r{G} 0.88 P_r{B} 0.12

Table 6: Gilbert-Elliott model result

4.2.1.1 Gilbert-Elliott model extension

The Gilbert-Elliott model takes only these three patterns (Equation1) into consideration which in most cases will give a good estimation of the network but it could happen (in short observations) that more patterns need to be included or some need to be removed. This could be done by considering the Gilbert-Elliott model as a Hidden Markov Model trained by the Baum-Welch algorithm.

(35)

In order to easily find the interesting (recurring) parameters for the Gilbert-Eliot model a pattern learning algorithm could be used. The learning algorithm is based on the well known association rule learning algorithm which is especially suitable to find hidden relationships in big databases. Association rule learning was introduced to discover regularities between products in large-scale transaction data recorded by point-of-sale systems in supermarkets (also called the market bas- ket analysis). A famous example of this was that in a supermarket in the USA association rule learning found out that on Thursdays men bought diapers and beers together, the store could with this knowl- edge increase sales for these products by placing them together.

Association rule learning provides the user with frequent items found in the database as well as it also generates rules from frequent item- sets. Another type of association rule learning is called sequential pattern mining which is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. This type suits our requirements best since the Gilbert-Eloit model doesn’t require any rules, also the database required for association rule learning must consist of a table with multiple columns and rows instead of a stream of data.

The mathematics behind the sequential pattern mining are very straight forward and simplistic which makes it fairly easy to implement. Al- gorithm 1 explains the basic idea behind sequential pattern mining.

The algorithm requires a sequential data stream (dataStream) in list form and a pattern to look for (also in list form) as input. First of all a counter will be initialized to keep track of how many occurrences (or pattern hits) have happened. Secondly the entire data stream will be looped. For each data item will a check be executed to see if the pattern will fit into the remaining data-stream. If this check is positive then will the pattern be looped and checked if the pattern item matches the data-stream item if this is correct then counter (c) will be incremented. If after looping the pattern, the length of the pattern equals to the value stored inside of C then one can say that the pattern occurred in the data stream on place i. After looping all the items out of the data stream will the algorithm return a percentage of how many times this particular pattern occurred in the data stream.

In order to find the X most occurring patterns one needs to run the algorithm with different patterns and select the patterns which result in a high percentage.

(36)

4.2 phase 2 25

Algorithm 1Sequential pattern mining

1: PatternMining(dataStream[], pattern[])

2: counter = 0

3: loop dataStream with i do

4: c = 0

5: if pattern fits in dataStream then

6: loop pattern with j do

7: if dataStream[i + j] = pattern[j] then

8: c = c + 1

9: if c = lenght(pattern) then

10: counter = counter + 1

11: return counter/lenght(dataStream)

Let’s consider the example again with where we have a input data stream consisting of 500 binary samples:

S⁶²FFS⁷⁷FS⁴⁶FFSFSFFFS¹¹FFS¹⁵FS⁴²FS²⁸FFS^9SFS³⁷FFS⁵ FFSFS³⁵FSFFSFS²³FFS⁴FFS¹⁸FS¹⁵FFSFFFSFFSFFFS⁵

After running the algorithm with this data stream and a max pattern length of 4 (used as a boundary for generating patterns) one can see that the probability of a the pattern 1 equals to 0.07 and pattern 111 equals to 0.01 all the recommended patterns are ordered in descend- ing order in Table7.

Pattern Occurrence

F 0.07

FF 0.03

FSF 0.01

FFF 0.01

FSFF 0.01

FFSF 0.01

FSSF 0.00

FFFF 0.00

Table 7: Pattern learning example result

4.2.2 Measurement campaign

The purpose of the measurement campaign is to collect data for (possibly) several traffic situation in order to evaluate the network quality in a probabilistic matter. At the AstaZero test track it is possible to

(37)

find all traffic scenarios explained in Section 2.5 even though that a full test of all the different traffic scenarios would be possible this wouldn’t give accurate result. It is better to conduct a single test specific to a certain traffic scenario in order to estimate the network quality better. Also having traffic specific test data could help future research into the V2I communication topic at Halmstad university.

In order to estimate the network quality, the system has to log relevant parameters which will help later on to construct a model for packet-losses and retransmissions. Initially what the system can is send a packet periodically from the vehicle to server. Such a packet has to include a time stamp, the GPS coordinates and the sequence number. Some extra variables that could be included in the packet are; some measurements from the vehicle its sensors, signal strength, encoding schemes, velocity, heading, trajectory history, etc. If more than one vehicle is executing the test it could be of great interest to also log what kind of messages (and its content) the vehicles receive from each other at different traffic scenarios. In case of estimating the network quality, a test with only one vehicle is more than enough.

What the vehicle can log are all the periodical transmitted packets as well as the ARQ’s received from the base station. As mentioned in Sec- tion2.2almost all packet losses will accrue between the vehicle and the base station. The logging of the ARQ’s could be a good estimate of network quality since the network quality is mostly determined by the amount of retransmissions. Another neat feature that could be extracted from the tests is the actual round trip time of a packet. In Section 4.1.1we found that the theoretical average latency equals to 13.67ms (without any retransmission). This could be easily tested by making the receiving side (server) time stamp each received packet.

During the data processing one could easily find the actual transmission latency by subtracting the time the packet was send with the time it has arrived.

Route documentation is important to keep track of the positions of the vehicle, GPS coordinates are used to log the position during each measurement run. In addition to GPS data, videos of each measurement could also be recorded through the windscreen of the vehicle.

GPS coordinates together with video information are used in the post processing of the data to find the reasons whenever an unexpected but significant difference between the links is observed due to varia- tions in traffic density, number of pedestrians, houses, road side environment, etc.

There are some limitations and challenges involved with executing a measurement campaign. Some of them are listed below.

• The amount of data: A very important question that should be addressed is ’How much data is enough?’ There is no correct answer to this question, the more data available to more accurate

(38)

4.2 phase 2 27

the prediction model will be. Will a 30 minute test (10 periodically send packets per second = ≈18000 packets per half hour) generate enough data to correctly predict the network quality?

• Amount of traffic: Does the amount of traffic on the network impact the test? Shall one test while the network is completely free of traffic or should the network be tested at its limits? In case of AstaZero we would like to know the worst case network quality so a good approach is to have a dummy system down- loading a big file from the network in order to simulate a lot of traffic.

• Test site: Shall one execute one big test including all the different traffic scenarios of shall each test be specific to a certain traffic scenario (rural, city, highway, etc.)? The customers of AstaZero usually come to the track to test out a specific scenario also the measurements will be more accurate if they are traffic specific

• Placing of the antenna:

• Accuracy of the GPS: The GPS’s accuracy might be a problem during the post-processing due to the well known phenomenon of "drifting".

• What clock to use: Another important factor to take into consideration is what type of clock to use to do all the logging since a couple of milliseconds in difference can result in the system assuming a retransmission has happened.

• Weather during the test: will rainy weather affect the network quality?

4.2.2.1 Execution of the measurement campaign

Unfortunately due to time restrictions of the thesis it was not possible to visit the AstaZero test track to execute the measurement campaign.

Instead we decided to use Chronos demo log files (see Section 4.2.3) for the network analysis. The AstaMoCo project group will execute new measurement campaigns in the future

4.2.3 Data post-processing

AstaZero conducted a test during there their latest researchers (Octo- ber 2017) . The test consisted of a hand full of vehicles executing a set maneuver on a T-junction in their city area. In Figure 10one can see the maneuver that each vehicle executed. Note that all the vehicles followed the Swedish traffic rules.

Each vehicle in Figure 10 is represented by a number (object ID see section4.2.3).

(39)

Figure 10: Test setup of AstaZero’s researchers day

• Robot car (ID 0): Was a fully autonomous (self-driving) vehicle.

• Human driver (ID 1): Was a regular car operated by a human driver.

• High speed platform (ID 3): Consisted of a wheeled, flat, metal platform which can be controlled remotely. On top of this platform a vast array of different balloons could be placed. These balloons could take the form of a car, bus, truck, or animal.

• Pedestrian (ID 4): Was a balloon shaped like a human strapped to a RC (radio controlled) car.

• Virtual object (ID 5) and Virtual truck (ID 6):Where objects that where implemented in software (not physically present during the test). The software program updates the server with position information of the planned trajectory.

Note that each object was equipped with the necessary communication and sensor equipment.

During the test each object sends out heartbeat packages periodically to the test server. The server will upon receiving of a packet time stamp it with the current arrival time. In this way it is possible to determine for how long the packet has travelled through the ether as well as an estimation of the amount of retransmissions that took place.

The data-set consists of a hand full of variables logged by the object and the server which follows the following format:

D

ET SI time at message arrival, Message type, Object ID, ET SI time at message departure, Latitude, Longitude,

Heading, Other sensor dataE

(40)

4.2 phase 2 29

• ETSI time at message arrival: Is the time at which the packet has arrived at the receiver (server side). Converting ETSI time to milliseconds is easily done by adding the amount of milliseconds from 1970 up to 2004 (without including leap seconds ) together with the difference in leap seconds between ETSI and UTS time plus the ETSI time itself.

t = ET SI_time+ MS_1970_T O_2004 + DIFF_LEAP_SECONDS t = ET SI_time+ 1072915200000 + 5

• Message type: Indicates the type off traffic. In this data-set this is always set to 3 which equals to monitor data.

• Object ID: Specifies who the sender was of that message.

• Object type: Was not specified in the explanation of the data set.

By default it was set to 0 for all messages.

• ETSI time at message departure: Gives the time at which the message was send by the object.

• Latitude and Longitude: Are the degrees latitude and longitude of the test object gathered from its GPS sensor.

• Heading: Gives the clockwise degrees from north.

• Other: Some other sensor data collected by the object.

The following row is an example gathered from the data. Keep in mind that semicolons and colons are mixed within one other as well as there is no consistency regarding spacing.

D

446390418570 : 3 6; 0; 446390418509;

577724328; 127697649; 20325; 0; 161; 0; 2;E

In order to use the 20 103 lines of logging data. A couple of steps had to be executed before it was ready to do calculations on them.

The first step is to remove the inconsistency of the colons, semicolons, and spaces. This was done by replacing all the spaces with a semi- colon symbol and the colon symbols were replaced by a empty sting.

This brought consistency to the data log and by splitting on the semi- colon symbol one can create a list of lists which can be easily looped, ordered, etc. The program also drops lines of data when that particular line is not exactly 12 items long and if there are items which contain nothing (a empty sting). Another task of the program is to calculate the time difference between the sending and receiving time of the message. This time difference is appended to the end of each list.

(41)

Python was used as the language to process all the data for the simple reason that Python is a simple and structured scripting language which is very power full for doing data processing. Also, Python is equipped with a bunch of plugins which makes the life of the de- veloper a bit easier. In Listing 1 one can find the developed python code.

(42)

4.2 phase 2 31

1 def openAndParseLogFile ( ) : d a t a L i s t = [ ]

f =open(’ event . l o g ’) l i n e s = f . r e a d l i n e s ( ) f o r l i n e i n l i n e s :

6 # p a r s e i n t o l i s t format

r = l i n e . r e p l a c e (’ ’, ’ ; ’) . r e p l a c e (’ : ’,’ ’) . s p l i t (’ ; ’)

#remove t h e ’\n ’ r = r [ : − 1 ]

11 # only append t h e good l i n e s i f(l e n( r ) ==12 and ’ ’ not i n r ) :

b = [ ]

f o r i i n range(l e n( r ) ) : i f i == 0 or i == 4 :

16 # change ETSI time t o m i l l i s e c

time = i n t( r [ i ] ) + MS_FROM_1970_TO_2004_NO_LEAP_SECS + DIFF_LEAP_SECONDS_UTC_ETSI

b . append ( time ) e l s e:

b . append (i n t( r [ i ] ) )

21

# c a l c d i f f e r e n c e i n time t i m e D i f = b [ 0 ] − b [ 4 ] b . append ( t i m e D i f )

26 d a t a L i s t . append ( b ) r e t u r n d a t a L i s t

Listing 1: Python code for parsing the log file

The second step was to discard the messages which where send by ID2 (unknown object). There are 61 samples present in the data-set which has this ID. The final step was to convert the latitude and longitude item from nine numerical digits to a float with the comma separator inserted after the the 7Th digit (counting from right to left).

From the data a bunch of parameters can be extracted using simple logic and calculus. The results can be found in table8

• Amount of samples: Give the amount of samples each object has sent to the server. Values varies between 300 and 6000 samples.

The amount of samples are calculated by simply asking python for the length of each list.

• Minimum and maximum transmission time: Is the minimum and maximum transmission time of each object in milliseconds.

Python has the build in functions min() and max() which makes it very easy to find the minimum or maximum transmission time.

(43)

• Average transmission time: The average transmission time is calculated by iterating all the transmission times of each object, counting them together and finally dividing them by the amount of samples of that object.

• Standard deviation: The Numpy library is used for calculating the standard deviation. The function requires you to give a list as the argument in the function and it will return the standard deviation of that list.

• Test duration: Gives the time duration of the test in minutes. It is calculated by dividing the element "time at transmission" of the last line with the element "time at transmission" of the first line (since the messages are by default sorted on arrival time).

This is repeated for each object separately.

• Sampling interval: Which is the average time between the sending of two consecutive packets. This is calculated by iterating each object and calculating the time between packet number N and packet number N+1. All these time differences are added together and divided by the amount of samples. The unit of the sampling rate is in milliseconds but it can be easily converted into frequency (Hz) with the following equation:

frequency (Hz) = 1 period (s) .

Another interesting parameter that could be extracted from the data is the speed of each vehicle at a certain point in time. In Figure 11 one can see the speed at any given point in time as well as the transmission time (same X-axes). The entire duration of the test is shown in the graph (roughly 1 minute) and as we can see all objects are standing still for the vast majority of time then execute their maneuver (crossing a T-junction with some traffic) and the test is over. By examining the speed and the transmission time in Figure11it visible that there is slight correlation between the speed of the object and its transmission time. For example; the high-speed platform (red line) starts accelerating at around 47500 (x-axes) and roughly around the same time the transmission time went from ≈(44-52)ms. When the car started to decelerate at around 58000 the transmission time also went down given a small delay of a couple of seconds. The reason for this small delay could because there where still quite a few packets waiting in the transmission queue. To prove that there is a correlation between transmission time and speed more data is needed. If there is no correlation between transmission time and speed a reason for the the transmission time going up at the same time as the speed goes could be due to the fact that while the object starts moving it starts to

(44)

4.2 phase 2 33

loose the line of sight connection or due to clock drifts (which is very unlikely).

Figure 11: Speed and transmission time over time

The speeds are calculated by taking the difference between the latitude and longitude coordinates of packet N with the latitude and longitude coordinates of packet N+1. This gives the distance travelled between packet N and packet N+1. Speed can then be calculated by dividing the distance with difference in time when packet N and N+1 were generated. Listing2presents the algorithm which takes a list of GPS coordinates as an input and outputs a list of speeds. The formula presented in this algorithm also takes the curvature of the earth into account instead of using a 2D space. The stepVal variable is used so that programs takes jumps of 5 between N and N+1. This is because otherwise the speed plot became extremely spiky. An average of those 5 values is used for the transmission time (not presented in the code snippet).

(45)

def c a l c u l a t e S p e e d F r o m C o o r d i n a t e s ( data ) :

# approximate r a d i u s o f e a r t h i n km

3 R = 6 3 7 3 . 0 s t e p V a l = 5

f o r i i n range( 0 ,l e n( data )−stepVal , s t e p V a l ) :

# Convert degrees i n t o r a d i a l s f o r python

8 latN = r a d i a n s (f l o a t( data [ i ] [ 5 ] ) ) lonN = r a d i a n s (f l o a t( data [ i ] [ 6 ] ) )

latNp1 = r a d i a n s (f l o a t( data [ i + s t e p V a l ] [ 5 ] ) ) lonNp1 = r a d i a n s (f l o a t( data [ i + s t e p V a l ] [ 6 ] ) )

13 # C a l c u l a t e d i f f e r e n c e between t h e two p o i n t s dlon = lonNp1 − lonN

d l a t = latNp1 − latN

a = s i n ( d l a t / 2 ) ∗∗ 2 + cos ( latN ) ∗ cos ( latNp1 ) ∗ s i n ( dlon / 2 ) ∗∗ 2

18 c = 2 ∗ atan2 ( s q r t ( a ) , s q r t ( 1 − a ) ) d i s t a n c e = R∗ c # i n km

t i m e D i f = data [ k ] [ i + s t e p V a l ] [ 4 ] − data [ k ] [ i ] [ 4 ] # i n ms

23 timeDifH = t i m e D i f ∗ 0 . 2 7 7 7 7 7 7 7 7 7 7 7 7 8 ∗ 10∗∗−6 # i n hours

i f ( timeDifH ! = 0 ) :

speed = d i s t a n c e / timeDifH # i n km/h

Listing 2: Python code for calculating speed from GPS coordinates

Another interesting graph that could be generated from the data is a histogram of the transmission times. With this histogram is it possible to see how all the transmission times are grouped. Figure 12 gives the transmission time for all objects (besides object with ID2).

As we can see the virtual object and the virtual truck with ID 5 and 6 respectively send all their packets with a transmission delay of 1 ms (virtual object) and 61 ms (virtual truck). This is because they are programmed in that matter, they do not actually use the LTE channel.

With this information it is safe to say that these two objects can be removed from the test since they will not contribute to the network quality. Moreover, they might cause wrong/bad results if kept in the data-set.

(46)

4.2 phase 2 35

VehicleRobotcar(ID0)Humandriver(ID1)Unknown(ID2)Highspeedplatform(ID3) Amountofsamples4754304613977 Mintransmissiontime(ms)241044639041908030 Maxtransmissiontime(ms)16910144639047930377 Avgtransmissiontime(ms)401444639044919447 Standarddeviation10.6717.26917670.6187.491 Averagespeed(km/h)4.4543.8582.962 Testduration(minutes)1.0131.0090.0001.013 Samplinginterval(ms)12.791199.3350.015.287 VehiclePedestrian(ID4)Virtualobject(ID5)Virtualtruck(ID6) Amountofsamples135435616085 Mintransmissiontime(ms)5060 Maxtransmissiontime(ms)66262 Avgtransmissiontime(ms)9061 Standarddeviation3.6880.5200.465 Averagespeed(km/h)0.593 Testduration(minutes)1.0131.0131.014 Samplinginterval(ms)44.89117.0819.998 Table8:SummaryoftheAstaZerotestlogdata

(47)

Figure 12: Histogram of transmission times from each test object

Figure 13 visualizes the transmission time spread when the virtual objects are removed (and all the objects transmission times are combined into one single object). From this histogram one can tell that there are two major groups one group with transmission time between ≈(50-20) ms and ≈(25-60)ms. In fact these two gaps are also visual if we go back to the transmission time over time plot (Figure 11). Here you could also see that there are two distinct groups with a clear gap between them. The red and blue line are clearly bundled together whereas the purple and green line are also bundled together.

Section4.2.4will try to give an explanation for why these group ap- pear.

In Figure 14 the CDF of each object is plotted out. Here the two groups are also visible. One can also see that the virtual objects can be removed since there CDF’s are unrealistic in comparison with the other objects.

4.2.4 Data matching

This section compares the data and its generated graphs with the earlier measurements which was the result from a measurement campaign executed at the AstaZero track and it also presents a solution for how these difference could be solved.

Figure15 shows both CDF next to each other. It is fairly easy to spot the differences between both. First of all the CDF from the earlier