How Different Parameters Affect the Downlink Speed

(1)

Linköping University | IDA Bachelor Thesis | Computer and Information Science Spring 2016| LIU-IDA/LITH-EX-G—16/071--SE

How Different Parameters Affect the

Downlink Speed

Martin Claesson

Lovisa Edholm

Handledare/Tutor, Niklas Carlsson Examinator, Nahid Shahmehri

(2)

Abstract

Today many societies rely on fast mobile networks, and the future seem to place even larger demand on the networks performance. This thesis analyzes which parameters af-fects the downlink speed of mobile networks. Various statistical analyses are performed on a large dataset provided by Bredbandskollen. We find that parameters such as the in-ternet service provider, the type of phone, the time of day and the density of population affect the downlink speed. We also find that the downlink speeds are significantly higher in urban areas compared to more rural regions.

(3)

Acknowledgments

Thanks to Rickard Dahlstrand at .SE for sharing the Bredbandskollen dataset, without the dataset this study would not have been possible. Thanks to our supervisor Niklas Carlsson for all the great guidance in this project. We would also like to thank our Tim Lestander and Jakob Nilsson for proof reading our thesis as well as providing us with helpful feedback during the process.

(4)

Abstract ii Acknowledgments iii Contents iv List of Figures v List of Tables vi 1 Introduction 2 1.1 Contributions . . . 3 1.2 Thesis outline . . . 4 2 Related Work 5 3 Method 6 3.1 The dataset . . . 6 3.2 Linear regression . . . 7 4 Results 9 4.1 Urban and rural areas . . . 9

4.2 Phones and tablets . . . 10

4.3 Time of day . . . 13

4.4 Distance from city center . . . 17

4.5 Density of measurements . . . 19

5 Discussion 21 5.1 Method . . . 21

5.2 Results . . . 22

5.3 Work in a wider context . . . 24

6 Conclusion 25 6.1 Possible future works . . . 25

(5)

List of Figures

1.1 Grids sized 1600x1600, 800x800, 400x400 and 200x200 meters. . . 3

1.2 Doughnut-shaped rings with a width of 100 meters and a radius between 100 and 2500 meters. Placed with the city center in the center of the circles . . . 4

4.1 Average downlink speed for different ISPs in the data set. . . 9

4.2 Average downlink speed for different ISPs in Stockholm, Göteborg, Malmö, Upp-sala and Linköping . . . 10

4.3 Average downlink speed for different mobile units for Telia . . . 11

4.4 Average downlink speed for different mobile units for Tele2 . . . 11

4.5 Average downlink speed for different mobile units for Telenor . . . 11

4.6 Total number of measurements per time of day . . . 13

4.7 Average downlink speed per time of day for Telia . . . 14

4.8 Average downlink speed per time of day for Tele2 . . . 14

4.9 Average downlink speed per time of day for Telenor . . . 14

4.10 Linear regression for Telia, with average downlink speed on the y-axis and number of measurements (interval for each hour of the day) on the x-axis.The equation for the linear regression is y =8 ¨ 10´5x+14.394 and the value for the coefficient of determination(R2_{) is 0.20671. . . 15}

4.11 Linear regression for Tele2, with average downlink speed on the y-axis and num-ber of measurements (interval for each hour of the day) on the x-axis. The equation for the linear regression is y=´0.0005x+25.089 and the value for the coefficient of determination(R2_{) is 0.86754 . . . 16}

4.12 Linear regression for Telenor, with average downlink speed in the y-axis and num-ber of measurements (interval for each hour of the day) in the x-axis. The equation for the linear regression is y=´0.0004x+21.475 and the value for the coefficient of determination(R2_{) is 0.89060 . . . 16}

4.13 Grids sized 1600x1600, 800x800, 400x400 and 200x200 meters. . . 19

4.14 Linear regression with average downlink on the y-axis and number of measure-ments per region in each square on the x-axis. . . 20

(6)

List of Tables

4.1 Difference in performance between devices. . . 12 4.2 Linear regression in grids of size 1600x1600, with average downlink speed (Mbps)

in the y-axis and distance to city center (km) in the x-axis . . . 17 4.3 Linear regression of 25 circles of varying distance from city center, with average

downlink speed (Mbps) in the y-axis and distance to city center (km) in the x-axis 18 4.4 Linear regression with average downlink speed (Mbps) on the y-axis and density

(7)

List of Tables

Students in the 5 year Information Technology program complete a semester-long soft-ware development project during their sixth semester (third year). The project is completed in mid-sized groups, and the students implement a mobile application intended to be used in a multi-actor setting, currently a search and rescue scenario. In parallel they study several topics relevant to the technical and ethical considerations in the project. The project culmi-nates by demonstrating a working product and a written report documenting the results of the practical development process including requirements elicitation. During the final stage of the semester, students form small groups and specialise in one topic, resulting in a bache-lor thesis. The current report represents the results obtained during this specialization work. Hence, the thesis should be viewed as part of a larger body of work required to pass the semester, including the conditions and requirements for a bachelor thesis.

(8)

1 Introduction

In a disaster scenario, the need for fast and reliable network is critical. Today, almost every-one owns a battery powered device such as a laptop, a smartphevery-one or a tablet. Since the transfer of data is very energy consuming, it is important to reduce the time spent download-ing data in order to maximize the device’s uptime. This is especially important in a disaster scenario when the access to charging points may be very limited.

Smartphone usage is very diverse, and a study made by Falaki et al. [4] has shown that the average interactions made by users per day vary from from 10-200 interactions. The same study shows that the amount of data received by each user varies between 1-1000 MB. Network performance varies depending on various conditions. To understand the cur-rent network conditions, models such as performance maps are valuable. In this thesis, we analyze network capabilities by using crowd-sourced network measurements and network performance maps summarizing these measurements. We use this data to create a multi-variate model for mobile download speeds.

This is done by using a large crowd sourced dataset provided by Bredbandskollen, Swe-dens primary internet test provider. By January 2016 it had been used to perform over 187 million measurements, and today about 100,000 measurements are performed using Bredbandskollen every day. This thesis focuses on the 16 million mobile non-WiFi measure-ments done between January 2014 and February 2015, using simultaneously collected meta information such as operator and geographic location we identify factors that impact the downlink speed. The locations in Sweden have been split up into different regions. In order to be able to use the techniques and softwares required for the analysis, we use aggregate measurements as well as down sample the dataset.

(9)

1.1. Contributions

1.1 Contributions

This thesis makes two primary contributions. The first one is to identify factors of poten-tial interest that may impact internet speed. For this purpose, we characterize the mobile speed test usage of Bredbandskollen by creating a performance map that fits our needs. The measurements are diurnal (with a peak-to-valley ratio of 16) and highly concentrated to regions where most people live, with a small amount of the geographical locations being responsible for most measurements. To allow for efficient analysis we therefore focus our analysis towards the more frequent locations.

The second contribution is to look at how well each of the candidate factors satisfies the assumptions, and if needed, transform variables. This is done by studying each factor in different contexts, and determine whether they satisfy the assumptions or not.

Cluster Analysis

To cluster the data we use three methods. First, similar to prior works analyzing this dataset [7], we divide the data into square buckets with sizes between 200x200 and 1600x1600 meters, as seen in Figure 1.1. Second, when considering geographic locality of the measurements proximity to a city center we analyze the data within doughnut-shaped rings with a width of 100 meters and a radius between 100 and 2500 meters (Figure 1.2). The third and last method is to divide the dataset within the 1600x1600 bucket (the largest bucket used in Figure 1.1) into nine equally sized squares. Since the dataset is sparse, we aggregate many measurements from a geographical area with similar characteristics, and perform analysis on this aggregated data [11]. For the purpose of our discussion, we call all measurements from such a location a group of clustered measurements.

(10)

1.2. Thesis outline

Figure 1.2: Doughnut-shaped rings with a width of 100 meters and a radius between 100 and 2500 meters. Placed with the city center in the center of the circles

1.2 Thesis outline

This thesis is structured as follows. Section 2 presents related works and the theoretical back-ground for the thesis. In Section 3 the methodology used to find the results is presented, together with some explanations of the dataset that is used for the study. Section 4 presents the results from the study. Section 5 contains a discussion of the methodologies and the re-sults from the previous chapter. The last section presents a conclusion and suggestions for further work.

(11)

2 Related Work

In a disaster scenario, the need for a reliable network for communication is essential. By analyzing previous measurements of download speed in different locations, it is possible to make predictions that make it possible to improve Quality of Servise (QoS) [13]. Such predic-tions are easier to make within smaller areas, since those areas have a lower covariance [7]. Areas with a larger downlink speeds tend to have neighbouring areas with high downlink speed, which makes predictions easier in areas with a high downlink speed. Due to this fact, a user can travel through an urban area and experience similar download speeds.

Network prediction maps is useful when trying to increase the download speed [3] which facilitates the communication over the network. Yao et al. [13] shows that historical data from a given place can provide a good picture of the speed you can expect the same place. In this thesis we investigate the factors that affect the download speed and in what way they affect it. The density of the base stations used will increase the success transmission density, but the increasing rates will diminish according to a study by Yu and Kim in 2013 [14]. This means that the number of base stations should be more than n-times to increase the network capacity by a factor of n.

According to Jang et al. [6] the QoS is lower on a device that moves at high speed than in one that does not. In that paper they studied network performances in a fast moving car on a highway and in a high-speed train running at 300 km/h, and found that mobile nodes experience far worse performance than stationary nodes over the same network. One source of error in our analysis is that our data set does not contain information about whether the devices are moving or not.

The network technologies used will also affect the QoS [6][7]. The different wireless link characteristics of 3G and 3.5G will make the performance vary greatly between the different network types, although they all support enough bandwidth for typical Internet applications.

(12)

3 Method

This chapter focuses on the method used to analyze the downlink speed, and how different parameters affects it. It begins with an explanation of the dataset used, and continues with an in depth explanation on linear regression and the coefficient of determination.

3.1 The dataset

This study is, as previously mentioned, based on a dataset provided by Bredbandskollen. Bredbandskollen is a swedish speed test service that is provided by The Internet Foundation in Sweden (IIS), an independent organization that promotes the positive development of the Internet in Sweden. The dataset is crowd sourced by mobile users testing their Internet speed on their tablet or phone using Bredbandskollen’s IOS or Android application. The ap-plication provides information about the uplink and downlink speed, as well as information regarding geographic location, latency and timestamps. The test are carried out against the geographically closest Internet exchange point. The dataset contains data from 2008-2015, but since the speed has increased so much between these years, we will narrow the dataset down to only the years 2014-2015 to avoid misleading result that may derive from the great variations in downlink and uplink speed. The Internet Service Providers (ISP) that are ana-lyzed are: Telia, Tele2 and Telenor. The different network technologies used in dataset are:

• EDGE (Enhanced Data rates for GSM Evolution, 2.5G) • GPRS (General Packet Radio Services, 2.5G)

• CDMA (Code Division Multiple Access, 3G)

• UMTS (Universal Mobile Telecommunications System, 3G) • HSPA (High-Speed Packet Access, 3.5G)

• HSPDA (High-Speed Downlink Packet Access, Turbo 3G) • HSPAP (High-Speed Packet Access Plus, 3G)

(13)

3.2. Linear regression

3.2 Linear regression

For most of the analysis made we have used linear regression [8]. Using linear regression we develop a function that creates an approximation-based model of the relationship between variables observed in the data from the Bredbandskollen. In statistics, linear regression is typically used to determine if there is a statistical link between a response variable and one or more explanatory variables. If more than two variables are used it is commonly referred to as multiple linear regression.

The probability distribution between x at every value for y is given by

y=b0+b1x+e, (3.1)

where b0is the intercept, b1is the slope and e is an uncorrelated error component. The variance is given by the equation

Var(x|y) =Var(b0+b1x+e) =s2. (3.2) By comparing a large number of measurements from Bredbandskollen we will approximate the impact of different variables on the bandwidth.

Regression models are primarily used for four different different purposes, inluding

• data description, • parameter estimation,

• prediction and estimation and • control.

Regression analysis is a helpful tool when developing equations to summarize or describe a type of data. Especially when you are working with a large dataset, like we are, a regression model may provide a more useful and convenient summary than a table or a graph[12]. A parameter estimation problem may also be solved by using a regression model. In such a case, regression analysis can be used to fit the data to the model and thus providing an estimation of a variable.

Regression models are also useful for predicting the response variable. For example, if we have a lot of data concerning network traffic, the network speed at a specific location can be predicted by using a regression model. In this case, it is important that the estimation of the model parameters are good since a poor estimation may result in a poor prediction. Errors in the model or the equation may also result in a poor prediction.

The fourth common area of usage for regression models are for control purposes. When a regression equation is used for this purpose, it is important that the variables are related in a casual manner. This cause-and-effect relationship is not needed for prediction, since it in this case is only necessary that the relationships that existed in the original data used for building the model are still valid.

Creating a regression model is an iterative process. An initial regression model is speci-fied by using any knowledge about the data from Bredbandskollen. In the next step the parameters of the model is estimated and evaluated as well as the model adequacy. This is repeated over again until an appropriate model is obtained.

(14)

3.2. Linear regression

Coefficient of determination

The coefficient of determination (R2_{), sometimes referred to as multiple correlation} coeffi-cient, is well established in classical statistical analysis. It is defined as the proportion of variance explained by the regression analysis, which makes it useful as a measure of success of predicting the dependent variable from the independent variable [9]. The coefficient of determination is defined as follows:

R2= SSR

SST =1 ´ SSRes

SST . (3.3)

In the equation above, SST is a measure of the variability of y without considering the effect of the regressor variable x and SSResis a measure of the variability in y remaining after x has been considered. Therefore, R2_{is often called the proportion of variance explained by} the regressor x.

The value of R2_{varies between 0 and 1, and can be seen as a percentage of how much of} the variability that is accounted for in the model; e.g., if the value of R2_{is 0.43, that means} that 43 percent of the variability in strength is accounted for [8].

(15)

4 Results

In this chapter we look at how different parameters affect the downlink speed and in what way they affect it. Looking at the first measurements of the data set we discovered clear patterns in the downlink speed.

4.1 Urban and rural areas

It turned out that the downlink speed was significantly higher in urban areas compared to the rest of Sweden. This result suggests that factors that are different in urban locations in comparison to more rural areas affect the speed in some significant way for all of the ISPs. To show this we include results for both the entire dataset and for the data collected in five of the biggest cities in Sweden: Stockholm, Gothenburg, Malmö, Uppsala and Linköping.

(16)

4.2. Phones and tablets

Figure 4.1 shows the average downlink speed for the ISPs in all of Sweden. Confidence intervals with a confidence level of 95 percent have been added to the graph. However, since the dataset is so large, the confidence intervals are so small that they are barely visible. For this reason it was decided not to include confidence intervals in the rest of the graphs in this thesis. Note that the average downlink speed for Telia is 18.49 Mbps, the average down link speed for Tele2 is 18.15 Mbps, and the average downlink speed for Telenor is 15.47 Mbps. Telia and Tele2 shows a significantly higher downlink speed than Telenor.

Figure 4.2 shows the difference between the average downlink speed for the different ISPs in urban parts of Sweden.

Figure 4.2: Average downlink speed for different ISPs in Stockholm, Göteborg, Malmö, Upp-sala and Linköping

From 4.2 we see that the behavior in the urban areas is different from the behaviour in Sweden overall. In this we see that Tele2 has significantly higher downlink speed than the other ISPs. Tele2 has a average downlink speed of 26.36 Mbps, Telia has a average downlink speed of 22.21 Mbps and Telenor has a average downlink speed of 22.19 Mbps.

These two charts indicates that the ISPs Telenor and Telia behave in a similar way on char-acteristics typical for urban areas, while Tele2 and Telia behaves in a similar way in Sweden overall.

4.2 Phones and tablets

The network speed may vary between the types of phones that are used. For example a newer phone may be able to use newer types of technologies and the network interface card may be more advanced. Due to this, we decided to try to see if any significant differences could be seen between different types of phone. Of course, in a dataset as large as the one from Bredbandskollen many different kinds of phone units will be used and some of them are used in just a few measurements. Therefore we decided to exclude the ones with less than 500 measurements. The results are separated between the three major ISPs: Telia, Tele2 and Telenor. This is done to avoid differences caused by characteristics in the ISPs being mistaken for differences caused by the devices.

(17)

Figure 4.3: Average downlink speed for different mobile units for Telia

Figure 4.4: Average downlink speed for different mobile units for Tele2

Figure 4.5: Average downlink speed for different mobile units for Telenor

As seen in Figures 4.3, 4.4 and 4.5, the tablets from Xperia and the Samsung SM phones has a high average downlink speed for all of the ISPs. Iphone, Ipad and HTC has among the lowest average downlink speeds in all of three cases. The results vary significantly between

(18)

the different types of phones and tablets, and it could be said that the choice of phone or tablet may affect the downlink speed. One limitation of this analysis is that geographic locations are not taken into account, e.g., a new phone such as OnePlus may have a higher average downlink speed if a larger portion of the measurements are made in a urban area. In general, the type of phone have similar values for each of the three ISPs. The one exception is the Ipad that for some reason has a low average downlink speed for Tele2. The difference between the two tablets (Ipad and Xperia) are significant and bigger than we expected.

To get a overall picture of the average performance of the devices were ranked within every ISP, where the best performing device was ranked as number 1 and the worst performing device was ranked as number 11. Table 4.1 shows the devices with the highest rank first, as evaluated across the three most popular ISPs. We also took the average speed for every device by dividing the sum of the average speed within the three ISPs in three.

Rank

Device Average Best Worst Average Downlink Quantity

XPERIA Tablets 2 1 3 24.30 Mbps 4,635 OnePlus 2.6 1 4 22.42 Mbps 8,945 Samsung SM 3 2 4 22.74 Mbps 201,273 Nexus 4 2 5 21.34 Mbps 25,986 Huawei 5 1 7 21.57 Mbps 244,348 XPERIA 5.3 4 7 20.57 Mbps 189,366 LG 7.3 6 10 17.50 Mbps 17,241 HTC 8.3 6 11 16.48 Mbps 88,779 Samsung 9 8 10 16.67 Mbps 134,968 Iphone 9.3 9 10 15.36 Mbps 572,178 Ipad 9.3 8 11 13.16 Mbps 76,320

Table 4.1: Difference in performance between devices.

In Table 4.1 we see that XPERIA tablets has the best ranking and also has significantly higher average downlink speed than the other devices. Looking at for example OnePlus, who has a better rank than Samsung SM, we see that Samsung SM has a higher average downlink speed than OnePlus. This also applies to Nexus in comparison to Huawei and HTC in comparison to Samsung. Overall we see that the performance patterns within the different ISPs is very similar, looking at the different devices.

(19)

4.3. Time of day

4.3 Time of day

The time of day is another parameter that may affect the downlink speed, mainly due to the fact that the number of simultaneous users varies a great deal during the day. This is shown in Figure 4.6. Since the graph looks almost identical for all of the different ISPs, the total number of measurements have been added into just one graph.

Figure 4.6: Total number of measurements per time of day

The affect this has on the downlink speed can be seen in Figures 4.7, 4.8 and 4.9, where we show the downlink speed on the y-axis and time of day (divided into intervals of one hour) on the x-axis. We note that Tele2 and Telenor looks fairly similar, and that Telia looks quite different from the other two. Telia has the highest average downlink speeds at the bus-iest times of the day, which could suggest that they adept their network resources on some sort of on demand schedule. Looking at the difference between 23:00-00:00 and 00:00-01:00 as well as the difference 10:00-11:00 and 11:00-12:00 the significant changes between this time spans suggest that some sort of adaption is done by Telia. Looking at Telenor and Tele2 we see that they follow a more expected pattern, where the average downlink is at its peak when there are less people using the network resources. We found this difference between the ISPs to be really surprising. The datasets have been checked for abnormalities in order to see if that could explain the big difference we see from Telia, but no such abnormalities could be found.

(20)

4.3. Time of day

Figure 4.7: Average downlink speed per time of day for Telia

Figure 4.8: Average downlink speed per time of day for Tele2

(21)

4.3. Time of day

Linear regression

To further investigate how the difference in user activity seen over the day (e.g. as in figure 4.7) affects the downlink speed, we performed a linear regression between the number of measurements and average downlink speed for each ISP. Figures 4.10, 4.11 and 4.12 present these results. In these graphs the number of measurements for each interval (with one hour time granularity) is used on the x-axis, and the y-axis still contains the average downlink speed. These graphs again highlights the difference in characteristics between the ISPs, where the time of day affects the speed similarly for Tele2 and Telenor, which is shown by the similarity in the linear equations. Telia still differs a lot from the other two, and again the time of day seems to have almost the opposite effect on the downlink speed compared to Tele2 and Telenor. Before we conducted these investigations, we expected that the downlink speed would decrease when the number of measurements increased. This expected pattern can be found in the graphs for Tele2 and Telenor, but the results from Telia differs significantly from our inital assumption.

Another thing that is worth noting is that the coefficient of determination is much higher for Tele2 and Telenor, which means that a larger portion of the variance is included in the model. The value of R2_{is 0.87 for Tele2, 0.89 for Telenor and 0.21 for Telia. The variance can be seen} in the graph as well where the dots are spread out quite far away from the trendline in the graph for Telia, and follows the trendline fairly well in the graphs for Telenor and Tele2.

Figure 4.10: Linear regression for Telia, with average downlink speed on the y-axis and num-ber of measurements (interval for each hour of the day) on the x-axis.The equation for the lin-ear regression is y=8 ¨ 10´5x+14.394 and the value for the coefficient of determination(R2₎ is 0.20671.

(22)

4.3. Time of day

Figure 4.11: Linear regression for Tele2, with average downlink speed on the y-axis and num-ber of measurements (interval for each hour of the day) on the x-axis. The equation for the lin-ear regression is y=´0.0005x+25.089 and the value for the coefficient of determination(R2)

is 0.86754

Figure 4.12: Linear regression for Telenor, with average downlink speed in the y-axis and number of measurements (interval for each hour of the day) in the x-axis. The equa-tion for the linear regression is y = ´0.0004x+21.475 and the value for the coefficient of determination(R2_{) is 0.89060}

(23)

4.4. Distance from city center

4.4 Distance from city center

Another parameter that was taken into account was how close the measurement was to the city center. This study focused on five of the biggest cities in Sweden: Stockholm, Gothen-burg, Malmö, Linköping and Uppsala. In each of the five cities a central point was deter-mined as follows: Central station in Stockholm, Götaplatsen in Gothenburg, Folkets park in Malmö, Stora torget in Linköping and Stora torget in Uppsala.

Grid-based analysis

After deciding the central positions, the dataset was partitioned into a grid with the size 1600x1600 meters for each of the cities. From each single measurement within the grid the distance to the central point was calculated using Pythagorean theorem. On these 15 grids (5 in each ISP) a linear regression was made, resulting in the following equations (4.2) and coefficients of determination.

Linear regression, Average downlink and distance to city center.

City ISP Linear equation R2

Stockholm Telia -0.035x + 21.736 1ˆ10´4 Stockholm Tele2 0.0033x + 22.183 0.001 Stockholm Telenor -0.363x + 25.563 0.006 Gothenburg Telia 0.001ln(x)+29.609 0.002 Gothenburg Tele2 0.282x + 23.910 0.003 Gothenburg Telenor 0.507x + 16.791 0.009 Malmö Telia 0.022x + 22.188 5ˆ10´5 Malmö Tele2 0.865x + 16.781 0.026 Malmö Telenor 0.607x + 15.153 0.013 Linköping Telia 1.435x + 28.847 0.037 Linköping Tele2 2.028x + 14.442 0.055 Linköping Telenor -0.858x + 20.548 2ˆ10´6 Uppsala Telia 1.429x + 12.459 0.029 Uppsala Tele2 -0.515x + 19.646 0.013 Uppsala Telenor -0.544x + 28.858 0.005

Table 4.2: Linear regression in grids of size 1600x1600, with average downlink speed (Mbps) in the y-axis and distance to city center (km) in the x-axis

What is worth noting is that 10 of the 15 linear regressions have a positive slope, which suggests that the speed declines as it gets closer to the city center. Another thing that is noteworthy is the relatively low numbers for the coefficients of determination (R2_{), which} means that a lot of the data can not be fitted into the linear model which makes the results a bit unreliable. In comparison with the linear regressions made earlier in the report these 15 regressions use a lot more data, so a lower value for R2_{is to be expected.}

(24)

4.4. Distance from city center

Circular experiment

Since the results from the first experiment regarding distance to city center had a low value for the coefficient of determination, it indicated that the results could be a bit unreliable. Because of this we decided to also divide the measurements into 25 circles, with a radius varying from 100-2,500 meters and where the smaller circles were not included in the larger ones. Since the results from Tele2 and Telenor had been so similar in the previous analyses we decided to combine the measurements from these two ISPs in this analysis. The results of these measurements can be found in Table 4.3.

Linear regression, Average downlink and distance to city center.

City ISP Linear equation R2

Stockholm Telenor and Tele2 0.197x + 22.247 0.099

Stockholm Telia 0.1473x + 20.274 0.096

Gothenburg Telenor and Tele2 0.0184x + 23.718 0.001

Gothenburg Telia -0.079x + 25.915 0.01

Malmö Telenor and Tele2 0.4658x + 17.152 0.152

Malmö Telia 0.3692x + 17.321 0.208

Linköping Telenor and Tele2 0.0025x + 20.498 9.2ˆ10´6

Linköping Telia -0.147x + 22.588 0.035

Uppsala Telenor and Tele2 -0.2011x + 23.776 0.059

Uppsala Telia 0.2025x + 18.479 0.065

Table 4.3: Linear regression of 25 circles of varying distance from city center, with average downlink speed (Mbps) in the y-axis and distance to city center (km) in the x-axis

These regressions show a slightly higher value for the coefficient of determination, which indicates a more reliable result than the linear regression in Table 4.3. What is worth noting is that the linear equation has a positive coefficient in seven of the ten linear regression, which means that slope is positive for these regression. With that in mind, it seems to indicate once again that the downlink speed increases as you move away from the city center. This seems to be more true in the bigger cities, i.e, Stockholm, Gothenburg and Malmö since five of the six linear regression has a positive slope in these cities.

(25)

4.5. Density of measurements

4.5 Density of measurements

Another parameter that we identify that could potentially affect the downlink speed was the number of measurements inside a specified area. For example, does regions with more measurements (and potentially more users) see higher or lower downlink speeds within a city? To answer this question, we partitioned the central 1600x1600 areas in each city in two different ways. The first way we divided the areas was by making four squares inside the 1600x1600 square. The other way in which we divided the data was by making a 3x3 grid net of each of the 1600x1600 squares.

Grids

Next, we seperated the 1600x1600 grids into three more grids of sizes 800x800, 400x400 and 200x200 resulting in four grids in total for each of the ISPs and cities (making it a total of 60 grids). Figure 4.13 shows how the different sized grids are placed with the city center in the center of every grid.

(26)

4.5. Density of measurements

For each one of the grids, a density was calculated as the number of measurements in that particular grid divided by the area of that region. In order to see how the density affected the downlink speed, we made a linear regression with average downlink speed on the y-axis and the density on the x-axis. This resulted in the following linear equations and coefficients of determination, shown in Table 4.4

Linear regression, Average downlink and density of measurements

ISP Linear equation R2

Telia 0.592x + 21.264 0.021

Tele2 -0.124x + 23.555 0.041

Telenor -0.192x + 23.146 0.238

Total -0.104x + 22.986 0.059

Table 4.4: Linear regression with average downlink speed (Mbps) on the y-axis and density of measurements per region on the x-axis.

Squares

To further investigate how the density of the measurements affected the downlink, the 1600x1600 regions were seperated into 9 squares of equal size. Another linear regression was made with average downlink speed on the y-axis and the number of measurements in each grid on the x-axis. Figure 4.14 shows these results.

Figure 4.14: Linear regression with average downlink on the y-axis and number of measure-ments per region in each square on the x-axis.

(27)

5 Discussion

In this chapter we discuss the results from the previous chapter, as well as the limitations and advantages of the method we have used to get our results.

5.1 Method

Our method focuses highly on linear regression with one variable. The upside of this ap-proach is that it is easy to see how particular parameters affects the downlink speed. The downside is that, compared to a multiple linear regressional model, we do not get as much information about how much the different parameters affect the downlink in comparison to one another. So to further explore our research questions, we could have focused more on performing various types of multiple linear regression. We did do a few multi linear regressions, but the information provided by them did not add any new information and they were done on smaller dataset which made them less reliable. This is the reason that they were not included in this paper.

Another problem with a multiple linear regression on our dataset is the problem of translat-ing parameters such as time and type of phone into a numeric value. This could be done by weighting a certain type of phone with the average downlink speed and the hourly intervals with the total number of measurements for that interval. Such trials were made, but we found the results to be inconclusive, and because of this they were not included in this thesis. Another way in which our results could have been improved would have been by using "category regressor" as in the methodology used by Borghol et al. [1]. That methodology provides an extended multi linear regression which gives the following equation,

Yi =b0+ P ÿ p=1 Xi,pbp+ K ÿ k=1 Zi,k+gk+gi, (5.1)

where K is the number of categories, P is the number of predictors and Zi,k is the category regressor, encoded as Zi,k =1 if category i is from category set k, and 0 otherwise. In such a case Zi,k =1, could be used for parameters such as phone or time of day.

(28)

5.2. Results

5.2 Results

An analysis that we would have liked to make is one that was based on the measurements proximity to the base stations. In order to do that we would have needed a map of their locations. 10 years ago, such a map was provided by the government agency Post- och Telestyrelsen (PTS), but it has been taken down from their website. We tried contacting PTS, as well as the different ISPs but our efforts where unfortunatly fruitless. This made such an examination impossible.

The first thing we noticed from the result was the big difference in downlink speed be-tween rural and urban areas, where the urban areas had a significantly higher average downlink speed. The parameters we looked at was:

• Operator • Type of phone

• Time of day, and the load on each hour • Population density in 3x3 grids for each city

• Population density in buckets of sizes 1600x1600, 800x800, 400x400 and 200x200 • Distance from central postion of cities

The average downlink speed of the operators differentiated somewhat from one another both in the rural and the urban areas. Tele2 was the ISP that had the highest average downlink speed in the urban areas, and Telia had the highest average downlink speed in rural areas (with Tele2 as a close second). The difference between the ISPs is a indication of that they handle the downlink speed differently. This might be a result of them reacting different on various parameters. In this case it would have been interesting to look at their cell towers. But, as noted above, PTS in Sweden no longer want to share this information for security reasons.

The next parameter we looked at was how different types of phones affected the speed. Our graphs shows that your phone type most likely will affect your downlink speed, with modern phones such as OnePlus and the Samsung SM models having higher downlink speed in average. Owners of HTC, Ipads and Iphone should be more dissapointed with our results. The impact the type of phone has on the downlink speed can be explained by the quality of the Network Inteface Cards (NIC) and what network technologies they are able to use. E.g, some of the older phones may not be able to use 4G or the other newer network technologies which will have negative impact on their downlink speed. The patterns for the different ISPs are very similar, which indicates that the hierarchy between the phones are more or less the same, independent of the ISP. We see a difference between the average downlink speeds for the same device in different networks. That difference is proportional with the average downlink speed differences between the networks. That makes it possible to compile a common table describing the hierarchy in all ISPs.

Another parameter we identified with potential to affect the downlink speed was how many simultaneous users there was. The first way we examined this was by looking at how many users there where at different intervals of time. The number of users varied greatly between different intervals, with just 6270 in total measurements between 04:00-05:00 and

(29)

5.2. Results

107958 measurements between 20:00-21:00. With these densities in mind, we made a linear regression for each of the ISPs. The results from this analysis were very similar for Tele2 and Telenor, but differentiated greatly with the results from Telia. Tele2 and Telenor followed a pattern where more measurements decreased the average downlink speed linearly, but for Telia the average downlink speed increased for the more densly hourly intervals. One way this could be explained is that Tele2 and Telenor share their network infrastructure for 2G and 4G thru a joint venture company called Net4Mobility1_{. But the results from Telia are} indeed peculiar. It makes sense that a high number of simultaneous users would decrease the downlink speed, as it does for Tele2 and Telenor, but with Telia we see a completely different pattern. One explanation for this could be that Telia adjust their capacities for different times of the day, but we have found no evidence to support this theory so it is pure speculation from our side. Another theory about why Telia preforms better than the other when the most traffic arises, is that Telias customers are more spread over the country. That probably means that the users do not need to share the tower cells with as many as the users who are gathered in urban areas. That, on the other hand, does not explain why Telia preforms worse during times when the number of users are low. The fact that Telia has a pattern that is opposite to Telenor and Tele2 might also indicate that time is a parameter that does not affect the downlink speed itself.

Another way in which we analysed the number of simultaneous user was by looking at the data from the five cities (Stockholm, Gothenburg, Malmö, Uppsala and Linköping). We started by picking a central position in each of the cities (Central station in Stockholm, Gö-taplatsen in Gothenburg, Folkets park in Malmö, Stora torget in Uppsala and Stora torget in Linköping) and using the longitude and latitude for that position. We then excluded all the data that was further away than 1600 meters from that location, based on the longitude and latitude of the measurements. The datasets were then divided into 3x3 grids. We also created buckets of sizes 1600x1600, 800x800, 400x400 and 200x200 meters. In all of the boxes of the grids and each of the buckets we calculated the number of measurements and the average downlink speed. A linear regression with average downlink in the y-axis and distance the central postion in the x-axis was then made for each of the cities and the ISPs, making it a total of fifteen regressions. For ten of the fifteen linear regression the trendline followed the pattern we expected, i.e a lower downlink speed in average for the measurements made close to the central position. But five of the regression had a pattern where the measurements closer to the central position had a higher average downlink speed. This was especially true for Telenor with such a pattern in three out of its five linear regressions. Telenor and Telia both had the expected pattern in four out of their five linear regressions. The expected patten was found in all of the regressions from Gothenburg and Malmö and in all but one from Linköping. We can not find a good reason for this, but to further investigate this a multilinear regression could be made with both distance to the central position and the area density in order to see a more clear connection.

A linear regression was also made for each of the operators with average downlink speed in the y-axis and the number of measurements divided by area in the x-axis. Here again, Tele2 and Telenor follows the pattern we expected and Telia has a pattern in contrast to our expectations. A linear regression was also made with the data from all of the ISPs, and this regression followed the pattern we expected. What is a bit peculiar with the regression made with the data from all of the ISPs is the significantly higher value of R2_{. This suggest that} more of the data is included in the linear model, thus making it the linear regression with the most accurate result. With this in mind, we can conclude that relation between average downlink speed and density of measurements in the different areas follows the pattern we expected.

(30)

5.3. Work in a wider context

Density of users per time and density of users nearby in smaller areas can be parame-ters that should be considered in a combined way instead of separately. The distance to the center of the city might, in itself, not be a parameter that affects the downlink speed at all. But it is an indicator that often sums the two previously mentioned parameters, density in time and density in geographical area. Users in the absolute center of a urban city is probably at a place with many other users. And when they are at the same area at the same time, they probably have the same pattern of behaviour. In that way they also use their devices in the same time. That might be why the downlink speed gets lower in cities the closer you get to the center.

5.3 Work in a wider context

Data sets such as that used here may contain sensitive information that might cause harm if used wrongfully or if they end up in the wrong hands [10]. Therefore we have signed an agreement with Linköping University saying that we agree to keep such information secret and secure. Sensitive information in such data sets could potentially be information A itself, or the information A in combination with some other information B, for example. We choose not to write about all the information that the data set contains, in order to keep the integrity, confidentiality and secrecy as high as possible [2]. We also only use aggregate values. Every decision made during this work has been made with various ethical aspects in mind. During this work we have followed the engineering codes of ethics [5].

(31)

6 Conclusion

We have showed that the dataset from Bredbandskollen can be used to draw some conclu-sions concerning how some parameters affect the downlink speed. The problem with the dataset is, even though it is a large collection of data, that once you start to sort it into smaller pieces the amount of measurements becomes to small to draw accurate and statistically backed conclusions. For example, one of the grids in Linköping for Telenor contained only two measurements. This is why an even larger dataset would have been needed to be even more sure of the conclusions we have made.

So what are the conclusions? Well, we have found that the ISP, the time of day, the type of phone and the proximity to a city will affect the downlink speed in general. So all in all, we have found somewhat satisfying answers to our research questions. What we do lack in this analysis is the relation between the different factors, i.e, how much they affect the downlink speed in comparison to one another. Some of our parameter has a close connection to each other, e.g, the time of day and population density both relate to how many simul-taneous users there are. The two parameters that stands on their own are the operator and the type of mobile unit that is used. This means that these two parameters would have to be treated differently than the other parameters in a multi variate model in order to achieve a good result.

In conclusion, if you want to make sure that you have a high downlink speed on your mobile unit, use one of the Samsung SM phones, place yourself close to Götaplatsen in Gothenburg, use Tele2 as your ISP and do this sometime between 04:00-05:00.

6.1 Possible future works

One area of further investigations is one that we have already discussed, i.e, how the different parameters relates to one another. One way to do this would be to find a way to associate numeric values to the parameters time of day and type of phone. Another thing that would improve this analysis would be if one could get a hold of a map of the locations of the tower cells used by the ISP. Such a map has existed in the past, but is unfortunately no longer public. Judging by the responses we got from PTS and the ISPs, such a map will be quite difficult to

(32)

6.1. Possible future works

get a hold of, since the ISPs do not seem willing to share that information for security reasons. Hopefully this will change in the future.

(33)

Bibliography

[1] Youmna Borghol, Sebastien Ardon, Niklas Carlsson, Derek Eager, and Anirban Ma-hanti. “The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity”. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (Beijing, China). ACM, 2012, pp. 1186–1194.

[2] Folker den Braber, Ida Hogganvik, Mass Soldal Lund, Ketil Stølen, and Fredrik Vraalsen. “Model-based security analysis in seven steps — a guided tour to the CORAS method”. In: BT Technology Journal (2007).

[3] Alberto Garcia Estevez and Niklas Carlsson. “Geo-location-aware emulations for per-formance evaluation of mobile applications”. In: Proceeding of the Annual Conference on Wireless On-demand Network Systems and Services. 2014, pp. 73–76.

[4] H. Falaki, R. Mahajan, S. Kandula, D. Lymberopoulos, R. Govindan, and D. Estrin. “Di-versity in smartphone usage”. In: Procedings of the ACM international conference on Mobile Systems, Applications and Sevices. ACM, 2010.

[5] Sven Ove Hansson. Teknik och etik. Avdelningen för Filosofi, Institutionen för Filosofi och Teknikhistoria, KTH, Stockholm., 2009.

[6] A. Keon Jang, Mongnam Han, Soohyun Cho, Hyung-Keun Ryu, Jaehwa Lee, Yeongseok Lee, and Sue B. Moon. “G3G and 3.5G wireless network performance mea-sured from moving cars and high-speed trains”. In: Proceedings of the ACM workshop on Mobile internet through cellular networks. 2009, pp. 19–24.

[7] Tova Linder, Pontus Persson, Anton Forsberg, Jakob Danielsson, and Niklas Carlsson. “On Using Crowd-sourced Network Measurements for Performance Prediction”. In: Proceedings of the Annual Conference on Wireless On-demand Network Systems and Services (WONS). 2016.

[8] Douglas C. Montgomery, Elizabeth A. Peck, and G. Geoffrey Vining. Introduction to Linear Regression Analysis. Wiley, 2012.

[9] N.J.D. Nagelkerke. “A Note on a General Definition of the Coefficient of Determina-tion”. In: Biometrika 78 (3): 691-692. (1991).

[10] Helen Nissenbaum. Privacy as Contextual Integrity. Washington law review association, 2004.

(34)

Bibliography

[11] Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schroedl. “Constrained K-means Clustering with Background Knowledge”. In: Proceedings of the International Conference on Machine Learning. 2001, pp. 577–584.

[12] Sanford Weisberg. Applied linear regression. Wiley, 2005.

[13] Jun Yao, S.S. Kanhere, and M. Hassan. “Improving QoS in High-Speed Mobility Using Bandwidth Maps”. In: IEEE Transactions on Mobile Computing (2012).

[14] Seung Min Yu and Seong-Lyun Kim. “Downlink Capacity and Base Station Density in Cellular Networks”. In: Proceedings of the Int.Symp.Modeling and Optimization in Mobile, Ad Hoc and Wireless networks (WiOpt). (Tsukuba Science City). 2015, pp. 119–124.

(35)

How Different Parameters Affect the Downlink Speed