Location Based Pre-caching and Network Coding in Smart Content Distribution

(1)

Location Based Pre-caching and Network

Coding in Smart Content Distribution

BO ZHANG

LUIS DELGADO ROMERA

Master’s Degree Project

Stockholm, Sweden January 2013

(2)

Abstract

Since the boom of the smart phones, there is a huge amount of ap-plications that deal with data located in the cloud. This fact can make these applications unavailable when the network is inaccessible due to coverage or congestion. A solution to these problems have been designed and developed for the Android OS in this master thesis. The approach is the application of location-based pre-caching, downloading the content of an application before the user enters in the zone where the application may use this content. Network coding has also been introduced in order to reduce the amount of data sent over the wireless networks. A caching scheme is introduced in binary network coding and applied to the problem of retransmission in wireless broadcast network. A binary network coding algorithm is designed which could asymptotically approach the efficiency of linear network coding with a much lower decoding complexity.

(3)

2 Background 7 2.1 Mobile applications . . . 7 2.1.1 Android OS . . . 7 2.1.2 Context-aware applications . . . 7 2.1.3 Location-aware applications . . . 8 2.1.4 Augmented reality . . . 8 2.2 Location in smartphones . . . 9 2.2.1 Location types . . . 10 2.2.2 Radio-based methods . . . 10 2.2.3 Propagation-based methods . . . 11 2.2.4 Fingerprinting . . . 11

2.3 Related work on location based system . . . 12

2.4 Network coding . . . 13

2.4.1 Problem introduction . . . 13

2.4.2 Wireless broadcast retransmission model . . . 13

2.4.3 System description . . . 14

(4)

2.5.1 ARQ . . . 15

2.5.2 Random Linear Network Coding (RLNC) . . . 15

2.5.3 Binary network coding . . . 16

3 Location-based pre-caching solution 18 3.1 Solution overview . . . 18 3.2 Basic concepts . . . 19 3.3 Location definition . . . 20 3.3.1 Geographical location . . . 20 3.3.2 Cellular location . . . 20 3.3.3 Wi-Fi location . . . 21 3.4 Zones definition . . . 21 3.4.1 Geographical zone . . . 21 3.4.2 Cellular zone . . . 22 3.4.3 Wi-Fi zone . . . 22 3.5 Technologies discussion . . . 22 3.6 Location methods . . . 24 3.6.1 GPS . . . 24

3.6.2 Cellular location method . . . 24

3.6.3 Cellular connected tower . . . 24

3.6.4 Cellular tower trilateration . . . 25

3.7 Area composition . . . 26

3.7.1 Pre-cached zone . . . 26

3.7.2 Download zone . . . 26

3.7.3 Tracking zone . . . 28

3.8 Zone detection . . . 28

3.8.1 Geographical zone detection . . . 28

3.8.2 Location provider influence . . . 30

3.9 Area zone determination . . . 30

3.9.1 Simple mode . . . 31

3.9.2 Hybrid mode . . . 32

3.10 Communication protocol . . . 33

(5)

4.1 Design motivation . . . 35

4.2 Solution and algorithm . . . 36

4.2.1 System scheme . . . 36 4.2.2 Decoding scheme . . . 37 4.2.3 Encoding scheme . . . 38 4.3 Complexity analysis . . . 42 4.3.1 Encoding complexity . . . 42 4.3.2 Decoding complexity . . . 43

4.3.3 Decoding space complexity . . . 43

5 Implementation 44 5.1 System overview . . . 44

5.2 Client architecture . . . 45

5.2.1 Location-based pre-caching service . . . 45

5.2.2 LBPC manager app . . . 49

5.2.3 Augmented reality application . . . 50

5.2.4 Experiments application . . . 50

5.3 Server architecture . . . 51

5.3.1 Transfer module . . . 52

5.3.2 Processor module . . . 52

5.3.3 Data sets module . . . 53

5.3.4 Persistence module . . . 53

5.3.5 Graphical user interface . . . 53

5.3.6 Resources monitor GUI . . . 54

6 System evaluation 55 6.1 Battery measurement . . . 56

6.1.1 Aim . . . 56

6.1.2 Setup . . . 56

6.1.3 Results . . . 56

6.2 Cell database availability . . . 57

6.2.1 Aim . . . 57

6.2.2 Setup . . . 57

(6)

6.3 Neighbor cells availability . . . 58

6.3.1 Aim . . . 58

6.3.2 Setup . . . 58

6.3.3 Result . . . 58

6.4 Cell methods accuracy . . . 59

6.4.1 Aim . . . 59

6.4.2 Setup . . . 60

6.4.3 Results . . . 60

6.5 NC efficiency simulation . . . 61

6.5.1 Simulation environment . . . 61

6.5.2 Simulation result on partial cache emptiness rate . . . 61

6.5.3 Simulation result on UE number . . . 62

6.5.4 Simulation result on total object number . . . 63

6.6 System evaluation discussion . . . 63

6.6.1 Technologies and zones . . . 63

6.6.2 Hybrid model recommendation . . . 64

6.6.3 NC scheme recommendation . . . 66

7 Conclusions 67

8 Future work 68

(7)

List of Figures

1 Monthly downloads in Android and iOS . . . 1

2 Data growth . . . 2

3 Augmented reality video . . . 9

4 Trilateration . . . 12

5 Triangulation . . . 12

6 Network coding in broadcast transmission . . . 14

7 System topology . . . 18

8 Solution overview . . . 19

9 Basic location concepts . . . 20

10 Location class diagram . . . 21

11 Round and Polygonal GeoZones . . . 22

12 Zones class diagram . . . 23

13 Connected cell tower transformation . . . 24

14 Cell tower trilateration . . . 25

15 Undesired download . . . 27

16 Round geographical zone detection . . . 29

17 Polygonal geographical zone detection . . . 29

18 Insight of sampling error . . . 30

19 Area zones determination . . . 31

20 Area determination activity diagram . . . 31

21 Download zone not detected . . . 32

22 Tracking and download detectors . . . 33

23 Communication protocol messages in map . . . 34

24 Communication protocol messages in time diagram . . . 35

25 Network Coding Transfer Structure . . . 37

27 Example Object Matrix . . . 41

29 Client architecture . . . 44

30 Server architecture . . . 45

31 Location module . . . 47

(8)

33 LBPC manager app screenshots . . . 50

34 3D model and video in augmented reality app . . . 51

35 Experiments app . . . 51

36 Resources monitor . . . 55

37 Screenshot of battery experiment . . . 56

38 Hours of battery life using only GSM and only GPS . . . 57

39 Path followed in Cell location experiment . . . 58

40 Towers seen in Cell location experiment . . . 59

41 Number of neighbors seen . . . 60

42 Distances to GSM towers while connected to them . . . 60

43 Impact of partial cache emptiness rate pi . . . 62

44 Impact of varying number of receivers m . . . 62

45 Impact of total object number n, pi= 0.2, m = 100 . . . 63

46 Ericsson building use case . . . 66

47 Variation of battery of hybrid model considering distance . . . . 66

48 NC complexity analysis . . . 67

49 Demo setup . . . 72

50 Main screen of LBPC manager app . . . 73

51 Preferences screen, select press map method . . . 73

52 Location simulated in the map . . . 74

53 Empty the cache of objects . . . 74

54 Augmented Reality application does not show content . . . 75

55 Enter in the download zone triggers the download . . . 75

56 Cache full again . . . 76

(9)

List of Tables

1 Tasks developed by each student . . . 6 2 Sample Packet Matrix . . . 15 3 Maximum distances to towers . . . 61 4 Accuracy and battery consumption of di↵erent technologies . . . 64 5 Usage of technologies for each zone . . . 65

(10)

List of abbreviations

OS Operative System

GPS Global Positioning System SDK Software Development Kit GUI Graphic User Interface

GSM Global System for Mobile Communications CPU Central Processing Unit

RAM Random Access Memory CD Compact Disc

3G 3rd Generation of mobile telecommunications technology KTH Kungliga Tekniska Hgskolan - Royal Institute of Technology NFC Near Field Communication

IDE Integrated Development Environment ADT Android Development Tools

XML Extensible Markup Language USB Universal Serial Bus

PoI Point of Interest

RSSI Radio Signal Strength Indication ToA Time of Arrival

AoA Angle of Arrival ToF Time of Flight

TCP Transmission Control Protocol UDP User Datagram Protocol IP Internet Protocol

API Application Programming Inteface CellID GSM cell identificator

SSID Service Set identificator AP Access Point

ACK Acknowledgement NC Network coding UE User Equipment

ARQ Automatic Repeat-reQuest RLNC Random Linear Network Coding

NCWBR Network Coding in Wireless Broadcast Retransmission CNCWBR Cache Network Coding in Wireless Broadcast Retransmission VCHA Vertex Colouring-based Heuristic Algorithm

WBRBNC Wireless Broadcast Retransmission using Binary Network Coding PDS Partial Decode Storage

(11)

1 Introduction

1.1 Smartphones and mobile applications

The explosion of mobile devices in the latest years has changed the way we interact with the world. The traditional cellular phones have gradually evolved into the so-called smartphones. A typical high-end smartphone manufactured in 2012 has a CPU unit of 1.5 GHz, a RAM memory of 2 GB, a tactile screen of around 5”, a camera of 5 megapixels, built-in location and motion sensors, and wireless capabilities. And all this encapsulated in less than 200 grams.

The actual and potential applications of a hand-held device with such fea-tures is huge. Some common functionalities are traditional voice communica-tion, messaging, email, web-browsing, social networking, news feeds reading, media streaming or gaming.

Figure 1 illustrates the amount of downloads in the two main mobile oper-ative systems: Android and iOS. The point is not the fight against them to be the leader in the market, but the fact that the amount of monthly downloads is measured in billions. This has been extracted from a study made by Xyo [22].

Figure 1: Monthly downloads in Android and iOS

1.2 Data in the cloud

The cloud is a term to refer to the Internet. Say that the data of an application is in the cloud is to say that it is available and obtainable using the standard

(12)

protocols of the Internet. A cloud is the symbol that has traditionally been used in networking to represent in diagrams a complex infrastructure; that is where the term comes from.

Some years ago, when the access to the Internet was not so generalized and the connections were not as fast as now, the software was designed to be executed in our computers with the data allocated in that same machine. People were buying their operative systems and office suits in cardboard boxes, making back up copies in floppy disks and playing their music from their own CD’s.

Nowadays, the tendency is that our data is allocated somewhere on the Internet, and just with a web browser or a thin application we can access to almost any kind of service. Our documents can be accessed and modified using services like Google Docs, we play video and audio using services like Youtube and Spotify, we share our files using Dropbox and we do it from any computer or mobile device, anywhere in the world.

The low cost and high transmission rate of the communication networks have changed the way that the applications are conceived. For many applications, the data is online, in the cloud and accessible from a wide variety of devices, such as computers, tablets, mobile phones or televisions. Mobile applications are sometimes just simple tools that let the user interact with this data in a friendly way.

The amount of data flowing in the networks is increasing exponentially and it will continue, as it illustrates Figure 2 from an study carried out by Ericsson in 2011. The growth of data accessed by mobile devices is also exponential. Finding smart ways to handle all this data is a key issue to minimize costs for both the client and the operators.

(13)

1.3 Caching

The procedure of keeping the data in the memory of the client device to avoid redownload it in the future is called caching. The term pre-caching refers to the prediction of the necessity of some data, requesting its download before it is actually used, improving the user experience, that will not have to wait until the data is downloaded.

Downloading data in mobile devices is expensive in terms of battery con-sumption, a not so big resource, so thinking of a good strategy of which data to download and when to download it is essential to reduce the battery consump-tion. Besides, redownloading data would unnecessarily increment the execution time a↵ecting to the performance and thus the user experience.

1.4 Out of coverage zones

Wireless networks like 3G are spread all over the world. But still in the coun-tryside or in some rural areas where the usage of mobile phones is not high, there could be zones where there is no coverage. Besides, we could also observe out of coverage zones in underground areas or inside buildings where the signal cannot go through the walls.

One clear disadvantage of applications that work with data in the cloud is that they may become unusable in out of coverage zones if they rely on download the data just when it is needed. One way of solving this is downloading the data in advance, that is, pre-cache the data.

1.5 Location in mobile devices

All the new generation smartphones come with a built-in GPS receiver, so the device can know its location. However, GPS is very battery consuming and does not work indoors, so less accurate alternative methods, especially based on radio signals received (telephony, Wi-Fi access points), are also being used as a complement to GPS.

1.6 Augmented reality

Augmented reality applications are a new tendency of applications whose aim is to enhance by computer-generated techniques the reality that we see, hear, feel or smell. In smartphones, they generally make use of the camera and location sensors of the phone to detect real life elements (monuments while visiting a city, products in a supermarket), and connect to a server to retrieve information about it, displaying it normally as an overlay on the image captured by the camera.

(14)

1.7 Network coding

Network coding is a method of trying to attaining maximum information flow in the network by using coding. It applicable in improving network efficiency in many di↵erent network conditions.

1.8 Problem

The problem to address in this master thesis is to find an e↵ective way to apply pre-caching in mobile devices based on the location of the user, in a way that it could be running in the background providing continuous service for applications whose data is in the cloud, even when entering in out of coverage zones.

1.9 Hypothesis

It is possible to design a service running in the background of a mobile device that pre-caches the content of mobile applications based on the location of the user, providing a continuous service for applications whose content is in the cloud. This could be done without a↵ecting considerably to the battery life of the device and without downloading an appreciable amount of data unnecessar-ily.

1.10 Approach

The out of coverage zones will be geographical regions predefined in the system. The device will monitor its location to trigger the pre-cache of the content before the user actually enters in an out of coverage zone. We will look for di↵erent location technologies and strategies to reduce the utilisation of resources on the device, trying to solve the di↵erent trade-o↵s between accuracy and resources consumption.

1.11 Goal

Design and implement a location-based pre-caching (LBPC) service for mobile applications that make our target applications available regardless of the cover-age of the location.

1.12 Tasks

• Design and implement a background location-based pre-caching ser-vice for the Android OS.

• Develop an augmented reality application for Android making use of the pre-caching service implemented.

(15)

• Design the communication protocol with the server that will host the zones definition in the system and the content of the application imple-mented.

• Carry out a testbed of experiments to evaluate the service, trying to analyze the availability, e↵ectiveness and resources consumption of each of the location methods implemented. Aspects such as the influence of the moving speed of the user, the initial state of the cache and the actual bandwidth should also be taken into account.

• Implement a real time resources consumption monitoring system, where we could se on real time from the screen of a computer the resources consumption (CPU, battery, data transmitted) in an Android device while running our service.

1.13 Context of the project

The master thesis work being described is part of a two people project done for Ericsson Research in Kista, Stockholm. Both Luis Delgado Romera and Bo Zhang are master students at KTH. Luis Delgado has been responsible of the client development, the design the communication protocol, the development of the real time usage data module, the design of the location techniques and the development of the augmented reality application.

Bo Zhang has been responsible of the development of the server, the design and development of the communication protocol and the introduction of network coding techniques in the pre-caching service. Table 1 shows the di↵erent tasks carried out by each of the students.

(16)

Table 1: Tasks developed by each student

(17)

2 Background

2.1 Mobile applications

A mobile application (or mobile app) is a software application designed to run on smartphones, tablets or other mobile devices. With the emergence of the smartphones, the market of mobile applications has experienced a boom in the last years, as shown in Figure 1.

2.1.1 Android OS

Android is a Linux-based operative system conceived for mobile devices. It is developed by Google in association with the Open Handset Alliance [2]. Some features provided by Android-enabled devices are telephony, messaging, media support, multiple language support, internet access, multitouch screen support, wireless communication, accelerometer, gyroscopes, proximity sensors, NFC, tethering, etc. [11]

The development of Android Applications is based on the Java program-ming language using the Android Software Development Toolkit (SDK), which includes also XML support for simpler components as GUI layouts or configu-ration. The official integrated development environment (IDE) is Eclipse, with the Android Development Tools (ADT) plugin. The SDK provides the Android emulator, avoiding the need of a real device to develop Android applications. However, the developed applications can also be immediately tested in a real device connecting the phone to the computer via USB.

Android has been chosen as the mobile OS to develop this master thesis work for its huge set of features, its openness, big open community, good docu-mentation and easiness for developers.

2.1.2 Context-aware applications

When humans talk to humans, they are able to use implicit situational in-formation, the context, making the conversations richer. However, while the interaction is between humans and computers, taking advantage of the context is not that easy.

A lot of research has been done in context awareness for mobile systems both in the fields of handheld and ubiquitous computing. Dey et al. carried out a survey to provide a better understanding in what the context is and what context-aware applications are [24].

They collect and analyze di↵erent definitions for context and they finally provide their own one:

(18)

situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an appli-cation, including the user and the application themselves.

Contextual information comprises location, time, identity of the user, ac-tivity being performed, etc. Dey et al. also provide in [24] a definition of context-aware systems:

A system is context-aware if it uses the context to provide relevant information and/or services to the user, when relevancy depends on the user’s tasks.

2.1.3 Location-aware applications

Location is one of the most important components of the context. Location-aware applications provide information or services based on their physical lo-cation [31]. These applilo-cations can provide a simple “you are here” in a map, traffic information, points of interest close to the user, reviews of restaurants nearby, show your location with your friends to meet them, etc.

2.1.4 Augmented reality

Augmented reality is a type of location-aware applications that enriches the perspective captured by the user superimposing objects on the real world. In smartphones, the reality is captured by means of the video streaming produced by the camera of the device. The application will augment the reality view with virtual objects. Figure 3 is a screenshot of the application developed in this thesis. It recognizes the box of a mobile phone and displays a video of its features on top.

There can be di↵erent tracking methods for aligning a virtual object with a 3-dimensional coordinate in the reality view. There are basically two methods, which can also be combined:

Location-based tracking including location sensors like GPS, digital com-pass and accelerometer

Optical tracking using image recognition

A Point of Interest (PoI) is an individual data item associated with a ge-ographical location (longitude, latitude, altitude) or a visual pattern (marker, book cover). A virtual object is normally rendered over a detected PoI. Virtual objects can be 2D images, 3D models, videos, etc.

(19)

Figure 3: Augmented reality video

2.1.4.1 Metaio Mobile SDK For the application developed in this thesis work, some frameworks providing augmented reality features were reviewed and tested, trying to obtain a solution that would not require big development ef-forts, as this is not the goal pursued in this thesis, but only a use case of the pre-caching service.

These characteristics were taken into account to choose the framework: eas-iness to develop a simple case, features, good documentation, opensource, and performance. Among the frameworks reviewed, the one that best fitted in our requirements was the Metaio Mobile SDK [12], which provides a good documen-tation and examples that can easily be modified to adapt it to the developer needs, provides both optical and location-based tracking and images, videos and 3D models for virtual objects. It is not opensource, but it can be used even with commercial purposes (only a watermark is printed in the applica-tion). We wanted the framework to be opensource, but it was by far the best of the solutions tried, so we decided to adopt the Metaio framework anyway.

Other frameworks tried were Wikitude SDK [21], LibreGeoSocial [10], OpenCV [15] and Mixare [13]. After having taking the decision, we also discovered the Qualcomm’s Vuforia framework [3], which has similar features to the Metaio framework, but the implementation phase had already started.

2.2 Location in smartphones

Most of the smartphone applications use only GPS to obtain the location of the user, as it directly provides a geographical coordinate without need of complex algorithms or methods. Despite its easiness, GPS has also important drawbacks,

(20)

as it does not work indoors and consumes an important amount of battery [39]. For that reason, alternative methods have been investigated in this thesis. In order to completely understand this methods, research has been done starting by the basics.

Smartphones provide di↵erent resources that can be used to obtain location information. They are mainly based on radio signals (GPS, WiFi and cellular communications), but also in di↵erent phenomena like the acceleration of the device (accelerometer) or the electromagnetic field (compass).

2.2.1 Location types

There are two basic ways to express a location [29]:

Physical location expressed in a coordinate system

Symbolic location determining if the target is inside a symbolic predefined zone: “home”, “work”, etc.

Symbolic location can easily be obtained from absolute location, just by defining the correspondence between range of absolute geographical coordinates and zones. While looking for symbolic location, there is no real need for accu-racy, but rather for reliability, as there is no need to know in which part of the zone the user is, but just if an element is inside or not [18].

2.2.2 Radio-based methods

Radio-based methods consist of the estimation of the location of an element based on:

• The known locations of one or more reference points, called beacons • The estimated distance or orientation of the element respect to each of

the beacons

For that purposes, the following parameters in the receptor can be used:

• Radio Signal Strength Indications (RSSI), power of the signal re-ceived by the antenna.

• Time of Arrival (ToA), sometimes called time of flight (ToF), is the travel time of a radio signal from a single transmitter to a remote single receiver [19]. By the relation between light speed in vacuum and the carrier frequency of a signal the time is a measure for the distance between transmitter and receiver. ToA is used by location systems like GPS.

(21)

• Angle of Arrival (AoA), which expresses the angle of the emitter respect to the receiver [37].

There are two basic families of methods to obtain location information based on radio signals. Propagation-based methods consider only a theoretical behaviour of the propagation of the signal, including or not phenomena like multipath or occlusion. Fingerprinting includes however an o✏ine setup phase in which the actual signals received from di↵erent beacons are actually recorded creating a radio map.

2.2.3 Propagation-based methods

The propagation-based methods can be based on RSSI, ToA or AoA. RSSI can be translated into distances from beacon points using radio prop-agation models [34]: Pr= Pt ✓ 4⇡d ◆n GtGr (1)

Ptand Prrepresents the power transmitted and received, the radio

wave-length, d the distance to the antenna, n the loss coefficient and Gt and Grthe

gains of the transmitter and receiver antennas respectively.

RSSI-based methods calculate the location of the user based on the distance to di↵erent beacons using the propagation model introduced above. This meth-ods have turned out to be inaccurate in di↵erent investigations [34]. Phenomena like multipath, interference and other shadowing e↵ects can significantly a↵ect to the error introduced by these methods, especially in indoor environments.

Once the distance to each beacon is estimated, the location of the element can be estimated using algorithms based on trilateration. Knowing the dis-tance to three beacons and assuming that it contains no errors, the absolute position of the element can be obtained as illustrated in Figure 4.

If however the known parameter from the beacon is the angle of arrival, AoA, when can apply triangulation as shown in Figure 5 to estimate the location.

2.2.4 Fingerprinting

An alternative way to obtain location information based on the RSSI is finger-printing. It consists of two di↵erent phases:

O✏ine phase where a radio map of the target area is built. The RSSI received from di↵erent beacons is recorded using a device with the same character-istics as the ones that will be used in the online phase. The records will be stored for di↵erent locations within the target area.

(22)

Figure 4: Trilateration

Figure 5: Triangulation

Online phase where the device estimates its location based on the best match-ing among the records stored in the radio map. The localization algorithms make use of deterministic and probabilistic techniques.

Fingerprinting techniques are specially appropriate in the range of frequen-cies of GSM and WiFi for two di↵erent reasons: the signal strength at those frequencies present important spatial variability and also a reliably consistency in time [34].

Concerning WiFi, several research groups have done big e↵orts in trying to create a location system based on fingerprinting. Some examples are Radar [23], Horus [43] or Compass, but they generally require specific and complex hardware.

2.3 Related work on location based system

Some systems and investigations have been reviewed to design and build our system. Ambulation, by Ryder et al is a mobility monitoring system using Android for patients who su↵er from mobility-a↵ecting chronic diseases such as Parkinson or muscular dystrophy. This system makes use of GPS taking into account energy usage, switching it o↵ when detecting being indoors using the accelerometer.

RAPS (Rate-adaptive positioning system), by Paek et al [36] takes advan-tage of the observation that GPS is generally less accurate in urban areas. It

(23)

also efficiently estimates user movement using a duty-cycled accelerometer, and utilizes Bluetooth communication to reduce position uncertainty among neigh-boring devices. Finally, it employs celltower-RSS blacklisting to detect GPS unavailability (e.g., indoors) and avoid turning on GPS in these cases.

LifeTag by Rekimoto et al is a Wi-Fi based continuous location logging system for life pattern analysis [38]. Nel Salmara from Telecom SudParis carried out an investigation on symbolic 3D Wi-Fi indoor positioning [18].

2.4 Network coding

2.4.1 Problem introduction

With the growth of modern application usage, the network transmission require-ment is increasing rapidly. Some mobile applications / services for example the proposed AR application has the following characters:

1. Mass Data : a large amount of data are required for the service.

2. Location based distribution : The UEs using the same service in same geography locations are tent to use same set of data.

3. Mobility : All the transmissions are conducted through wireless channels.

One natural character of wireless communication is broadcast. So broadcast could be used to improve the data transmission efficiency in proposed condition. For example, the UEs U1and U2are communicating with the same wireless data

sender(e.g. Base Station). Both of them require data P1 and P2. Since it is a

broadcast channel, rather than transferring P1and P2 separately to U1 and U2,

it is better to transmit P1and P2 through broadcast channel.

Moreover, if U E1 and U E2 already has some of the data previously, the

efficiency could be improved by using network coding. Shown as in figure 6, both U1 and U2require data P1and P2. U1already has P1and U2 already has

P2. The most common scheme, the broadcaster sends the packets P1 and P2

separately. Another way is using network coding scheme which handles it by broadcasting P1 P2during the transmission. The UEs could recover the data

by decoding; U1 could recover P2 from P1 (P1 P2) and U2could recover P1

form P2 (P1 P2) [35]. By introducing network coding, only one packet (

P1 P2) is sent instead of two packets (P1and P2 separately).

2.4.2 Wireless broadcast retransmission model

Another natural character of wireless communication is its unstable and unre-liable communication channel. The signal-to-noise ratio is significantly lower than wired channel. The consequence to digital wireless network is that packet losses are encountered at various channel conditions. In order to deal with

(24)

Figure 6: Network coding in broadcast transmission

packet loss, retransmission schemes could be introduced to improve the relia-bility. Especially in the conditions of high packet loss rate, the efficiency of the schemes is very important.

Network coding could also be applied in the retransmission schemes. It is a similar problem as the transmission scheme while UEs have partial data which is proposed in last chapter. Figure 6 could also be translated as during the transmission in lossy broadcast channel, the packet P2 is lost by U1, and packet

P1 is lost by U2.

In this paper, we would consider as the same problem and all existing scheme dealing any one of the problem could be considered as a possible solution for both of the two.

2.4.3 System description

Generalizing the problem into the scheme of broadcasting multiple packets to multiple UEs, applying network coding would reduce the number of packets transferred. In this case, one of the key problem is to find the optimal coding for the system which could minimize the overall number of retransmission.

The target system is defined as follows:

1. sender broadcasts Objects (P1, P2, ..., PN) to m receivers (U1, U2, ..., UM)

through a wireless channel.

(25)

already has follows the Bernoulli distribution according to pi. The packet

partial cache rate and partial cache distribution among di↵erent receivers are independent.

3. The sender has the knowledge of the partial cache distribution information for all UEs.

4. All transmissions are guaranteed to be received by all UEs.

In the proposed system, a object matrix T could be constructed and main-tained for recording the partial cache distribution knowledge and transmission status according to the (ACK/N ACK) acknowledgements. The rows repre-sents the receivers and the columns reprerepre-sents the Objects. When the kth object is already in cache by the ith receiver, then set the corresponding cell in the matrix Ti,k to 0. Else if the object is required by the receiver, set the

cell Tk,i to 1. As shown in the sample matrix, six objects are required by four

receivers. Neither of the UEs has P1and all UEs already has P4in cache, while

other 4 object are in cache of some of the receivers.

In this paper, T always represents the 2-dimension packet matrix, m, n, and pi always represents the receiver number, the total object number and the

corresponding partial cache emptiness rate for ith receiver.

Table 2: Sample Packet Matrix P1 P2 P3 P4 P5 P6

U1 1 1 1 0 0 0

U2 1 0 1 0 1 0

U3 1 1 0 0 1 0

U4 1 1 0 0 0 1

2.5 Existing NC Schemes on proposed system

2.5.1 ARQ

All Objects are transmitted if any of the receiver requires the packet.

2.5.2 Random Linear Network Coding (RLNC)

Linear network coding was first proposed in [32], and it was extended into Random Linear Network Coding[28] which become one of the most common NC solution. RLNC was applied in system in [40].

In RLNC, for j objects P1, P2, ..., Pj needed to be transferred in total, each

(26)

R

i

=

j

P

t=1

g

i,t

P

t

The coding coefficients are randomly selected in a sufficient large finite field GF (x). If receiver R require k objects, after k encoded objects are received in transmission phase, receiver R can recover all objects in two steps:

First, for all Pf which are received in transmission phase, and for all Ri,

calculate R0_i= Ri gi,fPf .Then a k-linear equation is generated as

0 B B @ gi1,t1 gi1,t2 ... gi1,tk gi2,t1 gi2,t2 ... gi2,tk ... ... ... ... gik,t1 gik,t2 ... gik,tk 1 C C A 0 B B @ Pt1 Pt2 ... Ptk 1 C C A = 0 B B B @ R1 0 R2 0 ... Rk 0 1 C C C A

Second, recover all the retransmissions by resolving the linear equation as 0 B B @ Pt1 Pt2 ... Ptk 1 C C A = 0 B B @ gi1,t1 gi1,t2 ... gi1,tk gi2,t1 gi2,t2 ... gi2,tk ... ... ... ... gik,t1 gik,t2 ... gik,tk 1 C C A 10 B B B @ R1 0 R2 0 ... Rk 0 1 C C C A

In the proposed system, using RLNC could achieve the theoretical minimum number of transmissions kmax = max

1im n

P

j=1

Ti,j. However, it increases the

com-plexity of the decoding procedure. Considering the whole decoding procedure, k transmission is needed by receiver R, The complexity for resolving k linear equations is O(k3_{). For specific total objects number n and partial cache rate}

pi, since in average k = pin, k is linear related to n for a specific pi. So the

average complexity to decode one single retransmission is O(n2_{). More over,}

calculations on finite field would have be executed on the whole block during the decoding procedure which is much more slower than XOR operations. For RLNC, due to its complexity of the decoding procedure and limited execution resources on the receiver, it is not feasible to resolve k-linear network coding equations in a reasonable time to fit a real time transfer, if k is big. Considering of implementing network coding in lower layer network protocols, RLNC has also a lower feasibility than binary ones for its complexity.

2.5.3 Binary network coding

During the transmission, binary network coding could also be applied. All the transmitted objects are encoded by XORing several of the original objects in binary. For its simplicity, only XOR operation is needed during the decode process, which is much more simpler than the calculations in finite field as in RLNC. It could be possible to apply this network coding scheme in the lower layers instead of application layer in a wireless broadcast network. [35] first

(27)

pro-posed the binary network coding schema with 2 receivers. [41] extended it to multiple receivers, and proposed NCWBR algorithm which is trying to resolve the problem how to generate the set of binary encoded transmissions efficiently. It is been proven that to find the best scheme is a NP-hard problem[25][30]. In papers [42],[26],[33],[27], several binary network coding algorithms have been designed and proposed to find a more efficient algorithm which is also feasi-ble in complexity. Xuanmin Lu improved NCWBR in his paper [42]. Weiwei F developed a Vertex Colouring-based Heuristic Algorithm (VCHA)[26] which constructs a graph based on the matrix in order to find the object which has the least possibilities to encode with other objects. Lu Lu used the bucket theory to create his grouping strategy[33], while L.Gou applied greedy methodology in designing his NC algorithm WBRBNC [27].

(28)

3 Location-based pre-caching solution

3.1 Solution overview

The system developed applies pre-caching based on the location of the user. The architecture of the system will be composed of:

The central server It will host the information about the areas where pre-caching is performed and the content of the example augmented reality application. This server will be one of our computers located in our office in Kista, and it is publicly accessible using TCP/IP.

The Android clients They execute the augmented reality application with the location-based pre-caching service running in the background. They will only communicate with the server and not among each other.

This is illustrated in Figure 7.

Figure 7: System topology

The approach is defining di↵erent zones for each area in which pre-caching will be applied. Each area is divided in three di↵erent zones:

Pre-caching zone Where the content must be already downloaded to be used Download zone Where the content is actually downloaded

(29)

The idea is that the mobile client will track the location of the device, and when it is detected to have entered in the download zone, it will automatically trigger the download of the content from the server, so it will be ready when entering inside the pre-cached zone. The main idea is illustrated in Figure 8.

Figure 8: Solution overview

Di↵erent methods to obtain the location of the user will be studied and each one will have a di↵erent accuracies and resources consumption. The way that this tradeo↵ was resolved will also be explained in this section.

3.2 Basic concepts

We have defined the following terms related to location:

Location It defines a geographical point in the system. The most basic way to think of a location is a geographical coordinate like the ones provided by GPS, but we will also define them in terms of cellular and Wi-Fi signals received from that point.

Zone It defines a geographical surface on the earth. As it is explained in Section 3.4, a zone can be defined using geographical coordinates or in terms of cellular and Wi-Fi signal that could be received from that zone.

Area It defines a target place in which pre-caching will be applied. To perform the pre-caching, three di↵erent zones are defined in every area: pre-cached zone, download zone and tracking zone. This will be covered in detail in section 3.7.

(30)

Region It defines a set of areas that can be retrieved at the same time from the server, avoiding requesting them individually to avoid unnecessary messages. It is not a very relevant concept in the system, the important ones are the described above.

Figure 9: Basic location concepts

3.3 Location definition

A location is a concrete position on the earth. The easiest way to think of a location is a geographical coordinate, but we have also defined them in terms of cellular and Wi-Fi signals. In section 3.8 we will explain how to determine if a location is inside a zone.

3.3.1 Geographical location

A geographical location, or GeoLocation, is a location defined by its geographical coordinate. Using the Android API, we can obtain the geographical location of the device using GPS. As we will cover in section 3.9, we have also implemented our own algorithms to obtain a geographical location based on the trilateration of cellular towers. All these methods have di↵erent accuracy, so the defini-tion of our geographical locadefini-tion has also an accuracy parameter providing the maximum expected error in meters.

3.3.2 Cellular location

A mobile device using the a cellular system like GSM can only be connected to a single tower. However, it can still receive the signal of other towers, called

(31)

the neighbor cells. The definition of a CellLocation includes the CellID and the received signal strength of the connected and the neighbor cells.

3.3.3 Wi-Fi location

Similarly to the cellular definition, we define a Wi-Fi location as the SSID and signal strength of the received Access Points (AP).

Figure 10 shows a class diagram of the di↵erent locations to illustrate its definitions.

Figure 10: Location class diagram

3.4 Zones definition

As it was explained above, a zone is the definition of a surface on the earth. The easiest way to think of a zone is a geographical definition but we have also defined them based on the Wi-Fi or GSM signals that can be observed from it.

3.4.1 Geographical zone

A Geographical zone is a zone defined using geographical coordinates. Two di↵erent zones have been implemented, illustrated in Figure 11.

(32)

Round geographical zone Defined by the geographical coordinate of the cen-ter and the radius in mecen-ters.

Polygonal geographical zone Defined by an ordered list of geographical co-ordinates. This points will be the vertices of a polygon.

Figure 11: Round and Polygonal GeoZones

3.4.2 Cellular zone

A cellular zone is a zone defined by the CellID of the towers that can be received from that zone. Optionally, a minimum signal strength value can be associated to each tower to make the definition more restrictive.

3.4.3 Wi-Fi zone

A Wi-Fi zone is a zone defined by the SSID of the Wi-Fi access points that can be received from that zone. Optionally, a minimum signal strength value can be associated to each AP to make the definition more restrictive.

Figure 12 shows the class diagram of the di↵erent zones implemented and the composition of an area based on zones.

3.5 Technologies discussion

We developed an open implementation allowing the definition of zones and lo-cations using di↵erent technologies as described above. However, in practice, not all the definitions have been used.

There are di↵erent problems while using Wi-Fi for location:

• If using existing AP, they can be switched on and o↵ unexpectedly • If using dedicated AP, several AP must be placed in strategical places,

(33)

Figure 12: Zones class diagram

• The propagation of WiFi signals vary with time and depends on the struc-ture of the buildings (multipath, occlusion. . . ).

We concluded that using Wi-Fi was a complicated problem to resolve, and we left it out of the scope. However, the implementation of Wi-Fi location and zones is done, although not fully supported. For that reason we have decided to mention it here and it will be included as future work.

Regarding cellular location information, there are two ways we can make use of it:

• Define both locations and zones in terms of cellIDs.

• Translate the cellular location into a geographical location using the loca-tion of the towers, so the same geographical zones can be used for both the locations obtained with GPS and from cellular methods.

The second one is more appropriate, as let us define the zones only in ge-ographical terms, making also easier the comparison of performances and re-sources consumption between these di↵erent technologies.

(34)

3.6 Location methods

In this section we will explain the di↵erent location methods, that is, the di↵er-ent modules in the system that provide any kind of Location instances. Imple-mentation aspects will not be covered here.

3.6.1 GPS

The GPS method will directly retrieve a GeoLocation instance every Tsampleseconds

based on the GPS signals obtained from the satellites. It will also provide the accuracy parameter as the maximum error en meters. This error would normally not be higher than 15-20 meters.

3.6.2 Cellular location method

The cellular location method obtains the cell information of connected and neighbor cells to retrieve an instance of CellLocation.

3.6.3 Cellular connected tower

The first element needed to be able to use cellular methods is a way to obtain the geographical of a cellular tower. The operators do not publish this information, and there are di↵erent projects to try to create a global cell id database.

OpenCellID [14], an opensource community-based database was studied, but while doing the tests not all the location of the cellIDs received were available. Besides, a Google API was also tried. This API is not officially documented by Google, but developers have been using them for years [1]. For all the cells received in our devices the location of the tower was successfully obtained.

The cell connected tower method makes use of the CellLocation obtained us-ing the cellular method and the cellID database, approximatus-ing the user location by the location of the tower that provides cellular service. The transformation performed is shown in Figure 13.

(35)

3.6.4 Cellular tower trilateration

This method also uses the CellLocation and the CellID datababe, but it also makes use of the di↵erent neighbor cells obtained. This method applies the trilateration techniques explained in the background to estimate the location of the user. Let us give an example considering three cells (the connected cell and two neighbors). The picture can be seen in Figure 14.

Figure 14: Cell tower trilateration

The input for this method is three di↵erent cell towers, whose RSSI (Received Signal Strength Indication) is known, obtained from the GSM Provided. The geographical coordinate of the towers is known as well, obtained from the CellID database. Connected tower and neighbors are treated exactly the same way.

The idea of the method is that the user will be inside the polygon created by the towers, and closer to the ones whose signal strength is higher. For each of the towers, a weight is calculated using Equation 2. Let us note that the RSSI must be in natural units, like Watts.

wi=

RSSIi(W )

P

jRSSIj(W )

(2)

Once we calculate all the di↵erent weights, the estimated location can be calculated using the Equation 3.

(x, y) = (X j xj· wj, X j yj· wj) (3)

(36)

Let us note that for only two towers, the estimated location will be over the line that links both towers. For only one cell, the estimated location would actually be the position of that tower. Then, we can say that the previous method described, Cellular connected tower, is just a single case of Cellular tower trilateration for only one tower.

3.7 Area composition

As explained in section 3.2, an area is a target place where we will apply pre-caching. To successfully apply it, three zones related to that area are defined: pre-cached zone, download zone and tracking zone.

3.7.1 Pre-cached zone

The pre-cached zone is the zone where the content must have already been downloaded and it is ready to be used. The definition of this zone will typically be determined by the geographical outline of the building or surface. Thus, the polygonal geographical zone is the one that best fits the pre-cached zones.

3.7.2 Download zone

The download zone is the zone where the content will actually be downloaded. Considering that the target zone is a building, the download zones would be typically round geographical zones around its doors. It is important that this zones are as small as possible, so users who are not actually going to enter in the pre-cached zone are not detected to be inside the download zone, triggering an undesired download.

This phenomenon is illustrated in Figure 15. There are two possible defi-nitions of the download zone, Dw1 and Dw2. The user was just passing by as it is described in the path, and he/she did not entered in the pre-cached zone. Using Dw1, no download is triggered. However, using Dw2, the user is detected to have entered in the download zone, triggering an undesired download.

For that reason, the minimum radius that this area must have to allow the download of the content will be calculated in this section.

There is no restriction that imposes that the zones must be round and not polygonal, but the round definition fits better with the model that will be de-scribed below. Otherwise it is possible to define a polygonal zone ensuring that regardless of the path that the user follows, a distance equal or bigger than this radius is covered.

To determine the radius of a download zone, we must ensure that the user spends inside enough time to download all the content. Excluding the e↵ect of the location provider, we must take into account the following factors:

(37)

Figure 15: Undesired download

• User moving speed

• Volume of data to download • Download rate

The e↵ect produced by the inaccuracies of the location method is not con-sidered here and will be explained in Section 3.8.1.

We will define the minimum radius of the download zone, radiusd, as a

function of the actual data to be downloaded (assuming that the cache is empty), the expected user moving speed and download rate.

We can express the user moving speed as ratio between distance and time:

speed(m/s) = distance(m)

time(s) (4)

And also the download rate as a ratio of data volume and time:

download rate(bytes/s) = data volume(bytes)

time(s) (5)

Combining both equations we can get an expression for the radius:

radiusd(m) =

speed(m/s)· data volume(bytes)

download rate(bytes/s) (6) For example, for an average walking speed of 5, 4 km/h (1, 5 m/s), a data volume of 4 M B and a download rate of 300 kB/s, we would obtain radiusd =

(38)

3.7.3 Tracking zone

The tracking zone is the zone in which the client application starts monitoring its location in a more frequent and accurate way, to be able to detect the moment in which the user enters in the download zone to trigger the download.

Of course, even when the user is outside the tracking zone, the location of the user has still to be detected, but less resource-intensive methods could be used even if their accuracy is not too high. The transition that must be detected in an accurate way is from the tracking to the download zone.

To resolve the tradeo↵ between accuracy and resources consumption, an hybrid mode using two di↵erent methods will be introduced in Section 3.9.2. Basically, one method will be used until we detect that the user has entered in the tracking zone. Then, the system will switch to the other method, providing better results of accuracy but consuming more resources.

Assuming the usage of this hybrid mode, the size of a tracking zone must be just big enough so the provider in charge of the download zone is successfully set up before the user actually enters in the download zone.

The minimum radius of the tracking zone, radiust, is calculated in Equation

7. User speed is the expected user speed. To cover the worst case, the fastest expected speed should be considered.

radiust= userSpeed· setupT ime|download (7)

3.8 Zone detection

Once the di↵erent definitions of each zone and locations have been explained, it is time to explain how the system detects if a location is inside a zone. This is only done for geographical zones, as Wi-Fi zones were finally left out of the scope of the project, and cellular locations are only used to apply trilateration obtaining a geographical location.

3.8.1 Geographical zone detection

Determine if a geographical coordinate is inside a geographical zone is basically a geometrical problem. We have defined two di↵erent types of geographical zones: round zones and polygonal zones.

Let us recall that there is also an accuracy parameter, providing the maxi-mum error in meters that the provider is introducing. Let us call location center to the coordinate provided by the location provider and location circle to the circle formed by the location and its maximum error. The problem to resolve is to determine if the location circle is either inside the zone or it at least intersects the zone.

(39)

For the round zones, the determination of the zone is trivial, we only have to check if the distance between the location coordinate and the center of the zone is smaller than the sum of the radius of the zone and the accuracy of the location.

Figure 16: Round geographical zone detection

For the polygonal zones, the problem is more complex. We can divide it in two cases:

• Case A: The location center is inside the polygonal zone

• Case B: The location center is outside but there exists an intersection between the location circle and the zone

Figure 17: Polygonal geographical zone detection

To detect the case A, we run al algorithm that carries out the following approach: it draws horizontal lines from the location coordinate towards the right of the zone. Then, it counts the number of intersections. If the numer is odd, the location is inside the zone. If the number is even, it is outside. A graphical insight can be seen in Figure 17.

If case A is false, the system will try to detect the case B. For it, it will take each segment and calculate the distance from the location center to the segment. If this distance is smaller than the radius, case B is true.

(40)

3.8.2 Location provider influence

We have mentioned that the error added by the provider is considered in the accuracy parameter of the geographical location. There are two di↵erent sources of error that the location provider introduces.

First, the inherent error provided by the method itself. For example, using GPS, the accuracy obtained can be di↵erent depending on the number of satellites received, the weather conditions, etc. The Android API provides the accuracy parameter of GPS and the Google Network location service while we request for location updates.

Second, the error due to the sampling, as the location is only obtained every certain period. Let us have a look at Figure 18. We can see two di↵erent paths, both heading towards the center of the zone. Following the path P, the zone is still not detected in p3 ; it is detected in p4 which is already a bit far from the border. However, while following the path Q, the zone is detected in q4, which is just after the zone border.

Figure 18: Insight of sampling error

The maximum di↵erence that we can obtain in time is the sampling period: Ts. To take this into account, we add to the accuracy parameter a contribution

that makes the system detect the zone on time, regardless of the deviation of the sampling. This is calculated for a given speed, that must be maximum to consider the worst case:

Accuracys= speed· Ts (8)

For a Ts= 1s and speed = 5, 4 km/h = 1, 5 m/s we would obtain Accuracys=

1.5m, and it would increase linearly with the sampling period. For a Ts= 5s,

we would obtain Accuracys= 7.5m, which is a value to take into account.

3.9 Area zone determination

We have already described how, given a zone, we check if a location is actually inside the zone or not. However, an area is a composition of di↵erent zones (tracking, download and pre-cached), and the user can be detected to be in more than one zone, as it is illustrated in Figure 19. So in which zone of the area is the user determined?

(41)

Figure 19: Area zones determination

Besides, we have mentioned that we could use di↵erent location providers to detect the tracking and the download zone. The hybrid model will be also introduced in this section.

3.9.1 Simple mode

The simple mode means that we use the same location provider to detect both the tracking and the download zone. To determine in which zone of an area the user is, we have implemented an algorithm. The main idea is that we want the user to be determined in the most restricted area as possible. Then, the algorithm could look somewhat similar to the one described in Figure 20.

Figure 20: Area determination activity diagram

However, because the location providers are not perfect, and because of the zone detection approach, this algorithm can be the source of problems. This is

(42)

illustrated in Figure 21. The user follows the path described. During that time, the location provider retrieves four locations. In l1, the user is determined to be in the tracking zone, and in l2, l3 and l4, is determined to be in the pre-cached zone. The download zone is never determined, so the content is never downloaded.

Figure 21: Download zone not detected

To solve this, we add a restriction to the algorithm: The pre-cached zone can only be detected if the previous zone was the pre-cached zone itself or the download zone. With this restriction, the user will be determined to be in:

• Location l1: tracking zone • Location l2: tracking zone • Location l3: download zone • Location l4: pre-cached zone 3.9.2 Hybrid mode

We have introduced the notion of hybrid mode, stating that we could have two di↵erent location providers, a tracking detector and a download detector, with di↵erent accuracies and resources consumptions.

The tracking detector cannot consume many resources, as it will be the one used by default, and it will be one most of the time. However, the accuracy requirements are not high, some hundred meters would be enough.

As soon as the tracking zone is detected for at least one zone, the system will switch to the download detector, which must have a higher accuracy to correctly detect the download zone and avoid undesired downloads.

Once the system has switched to the download detector, it will remain until either the user goes out of the tracking zone, or the download zone is detected. As soon as the download is triggered, the accuracy requirements are not high anymore, so we can switch back to the tracking zone detector.

(43)

There is a problem that this algorithm does not solve. Let us use Figure 22 to illustrate it. Let us imagine that the user is static in the position shown with the red dot. Using the tracking detector, the dark blue location will be provided, determining the system in the tracking zone. Then, the system will switch to the download detector. However, this detector provides a more accurate location, which shows that the user had not really entered in the tracking zone. Then, it will be switched back to the tracking detector. Thus, this algorithm is not stable and it may makes the system switch between detectors constantly.

Figure 22: Tracking and download detectors

To solve this, once the tracking zone is detected, the first location obtained with the download detector will be saved. Only if the user is detected to move in the opposite direction of the tracking zone the system will switch back to the tracking detector.

Another restriction while using the hybrid model is that we will not allow to detect download or pre-cached zones using the tracking detector. The system must switch first to the download detector and detect it with the accurate location provider.

3.10 Communication protocol

In this section we will explain the di↵erent messages sent between client and server in order to successfully carry out the location based pre-caching. We have eight di↵erent messages: four messages and its ACK. Let us give a simple use case to illustrate it, it can be graphically seen in Figure 23.

As we explained in Section 3.2, a Region is just a definition of a geographical zone and the areas that lie inside. When the client system detects to have entered in a di↵erent region, as can be seen in point 1 in Figure 23, it sends an UpdateMessage, with the current location. The server will immediately answer with an UpdateMessageACK, providing the geographical zone of the new region and the id of its areas (in this case 1 and 2).

(44)

Figure 23: Communication protocol messages in map

If the client did not have the area information in its memory, or if it is not up to date, it will request it with a RequestAreaMessage, sending one for each desired area. The server will answer with a RequestAreaMessageACK providing the area information, which is basically the definition of each of the zones.

Until the client enters in a download zone of an area, no communication with the server takes place. When this happens, as shown in the point 2 of the mentioned figure, the client will send a RequestContentMessage, providing the id of the data objects that are already cached in the memory of the device (in this case 1 and 4). The server will answer with a RequestContentMes-sageACK, and will start pushing the content into the client. Each of the content objects will be encapsulated in a PushContentMessage. When the client has successfully received it, it will send a PushContentMessageACK.

(45)

Figure 24: Communication protocol messages in time diagram

4 Network coding solution

4.1 Design motivation

In proposed system, network coding improves the retransmission efficiency be-cause the objects in cache are used for decoding in transmissions. Di↵erent new objects could be recovered through one single coded retransmission by di↵erent receivers. The transmission is more efficient due to the fact that the encoded transmission is ”valuable” to more receivers.

During the transmission phase, in order to get the new object, the receiver must be able to successfully decode the encoded transmission. From the con-cept of algorithm design, the following mechanism is proposed to keep the transmission decodable. Considering one single binary encoded transmission, Pk1 Pk2 ... Pkj , if it could be decoded by receiver Ri, it must satisfy the

following condition: j P t=1 Ti,kt  1 (9)

In NCWBR [41], if the transmitted object is encoded from two or more original objects which are not in the receiver’s cache, which means that the retransmitted packet does not fulfil the condition, the receiver would drop the transmission and will send a N ACK response to the sender. These required

(46)

objects will be transferred in other transmissions. In [42, 26, 33, 27], this kind of situation is avoided by the algorithm design in the sender. By doing so, every transmission should fulfil the condition 9.

However, if the problem is extended to a series of transmissions, every trans-mission should fulfils the condition becomes only a sufficient condition, but no longer necessary. Not every transmission must be decoded immediately when received, as long as it is guaranteed all the transmissions are decoded eventually. It is obvious that the previously transmitted objects can be used to decode new transmissions, but it is also possible to do it in reverse order: using new objects to decode previous transmissions.

Let’s take a simple example, assuming receiver R require 2 objects P1and P2

during the transmission. P1 P2 can be transferred, but could not be decoded

by R at the moment. In this case, NCWBR[41] scheme will drop the packet. Other binary network coding schemes, for example [42, 26, 33, 27], will set up a mechanism to avoid this situation. However, there is one more possibility: P1 P2 could be stored first, and then P2 is transferred. P1 P2 could be

decoded by using P2and thus both P1 and P2can be gotten.

The advantage for adopting cached decoding is that it lowers the restric-tions on decoding in binary network coding systems. Thus, it increases the possibility to group more objects into one single coded transmission. It also introduces more information entropy to the system for every transmission and thereby reducing the total number of transmissions. For example in the Sample Object Matrix in table 2 in background section, for any of those algorithms with restriction that each transmission should fulfil the proposed condition 9, the transmission number would be at least 4. However, it could be seen that, the best solution is 3 transmissions P1, P2 P3, P3 P5 P6. Considering

U1, although it could not decode the transmission P2 P3 when it receives

it, it could however be restored using the proposed method. Once it receives P3 P5 P6, it could recover P3 first, and then recover P2 from the restored

P2 P3.

4.2 Solution and algorithm

4.2.1 System scheme

Based on the cached decoding mechanism described above, a transmission sys-tem would be designed as in figure 25. The main di↵erence between this syssys-tem and other binary NC system is that the partial decode storage is introduced. It is used to restore the encoded transmissions which can not be decoded by the receiver in real-time. The other di↵erence is that a well-designed network coding algorithm is needed in the sender to support the cached decoding method.

(47)

Figure 25: Network Coding Transfer Structure

Figure 26: Network Coding Transfer Structure

4.2.2 Decoding scheme

On the receiver side, the decoding logic can be summarized by the following steps:

1. Get the input encoded transmission, which is binary XORed by k original objects in set X =_{Pj1, Pj2, .., Pjk}.

(48)

which are already in cache before transmission or successfully transferred and decoded in previous transmissions. Calculate X S. The packet partial cache rate and partial cache distribution among di↵erent receivers are independent.

3. If X S = , the transmission contains no new information the receiver. Drop the transmission and finish.

4. If|X S| > 1, the transmission is currently not decodable, store it into the Partial Decode Storage(PDS), and finish.

5. If|X S| = 1, one object could be successfully decoded directly from the transmission. Decode out the object Pd and put it into the process queue.

6. While process queue is not empty, get one object Pdi out from the queue

and put it into the Object Storage S.

7. Search in the PDS. For all partial decoded transmission in PDS which are XORed from Pdi, remove Pdi from the encoding by calculating the

transmission XOR Pdi. The partial decoded transmission after removal of

Pdi would be encoded by j objects Pq1, Pq2, ..., Pqj(j 1). If j > 1,restore

it back to PDS. If j = 1, put it into the process queue. 8. Loop to step 6.

The algorithm 1 is a more formal description of the decode logic.

4.2.3 Encoding scheme

Considering the corresponding network coding algorithm on the sender, to find the optimal solution to minimize the transmission number is a NP-Hard problem[25][30]. One near-optimal solution within a feasible complexity is pro-posed.

For ease of explanation, it is defined that a transmission is valuable to a receiver if the transmission is linearly independent to all the objects and trans-missions previously received by this receiver.

In order to minimize the total number of transmissions, in each single trans-mission, greedy algorithm is used to compose a transmission R so it maximizes the number of receivers to whom R is valuable. The coding algorithm for a single transmission is divided into 3 steps.

Step 1: greedily construct a set of objects G, so as many receivers as possible can find a valuable object in G, while no receivers get more than one valuable object. For receivers which can not find a valuable object in G, it is called remaining receivers.

Step 2: greedily construct a set of objects G0, so as many remaining receivers as possible get a valuable object in G0, while none of them get more than one valuable object.

(49)

Algorithm 1 Decoding algorithm on Receiver:

Require: The encoded transmission R = Pj1 Pj2 ... Pjk, Object Storage

S ={Ps1, Ps2, ..., Pst} ,PDS C = {R 0 c1, R 0 c2, ..., R 0 cd} 1: X {Pj1, Pj2, ..., Pjk} 2: if |X S| = 0 then 3: return 4: end if 5: if _|X S_{| > 1 then}

6: Decode R using objects in S, get R0cd+1 = XORed object in X S

7: Put R0cd+1 into C

8: return

9: end if

10: if _|X S_{| = 1 then}

11: Decode R using objects in S, get decoded object Pd

12: Create process queue Q =_{Pd}

13: while Q6= do

14: Get Pdi from Q and remove Pdi from Q

15: Put Pdi into S

16: for all R0ci which could be decoded by Pdi do

17: R0ci R 0

ci Pdi

18: if R0ci is a fully decoded object then

19: Put Rc0i into Q and remove R 0 ci from C 20: else 21: Restore Rc0i back to C 22: end if 23: end for 24: end while 25: end if

Location Based Pre-caching and Network Coding in Smart Content Distribution