Practical analysis of the Precision Time Protocol under different types of system load

(1)

Practical analysis of the

Precision Time Protocol under different types of system load

EMIL GEDDA

ANDERS ERIKSSON

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

(2)

Practical analysis of the

Precision Time Protocol under different types of system load

EMIL GEDDA

ANDERS ERIKSSON

Bachelor Thesis in Computer Science Date: June 7, 2017

Supervisor: Pawel Herman Examiner: Örjan Ekeberg

Swedish title: Praktisk analys av IEEE 1588 under olika typer av systembelastning

School of Computer Science and Communication

(3)

Abstract

The existence of distributed real-time systems calls for protocols for high accuracy time synchronization between devices. One such protocol, the Precision Time Protocol (PTP) reaches sub microsecond synchronization precision.

PTP can be implemented both in hardware and software. This study aimed to analyze how system stress could affect the accuracy and precision of software implemented PTP between two devices. This was done using two Intel Galileo Generation 2 running Linux systems. Software was used to simulate CPU, I/O, network, and OS usage. Data was extracted from software logs and summarized in charts and then analyzed.

The results showed that PTP synchronization accuracy and precision does suffer under certain types of system load, most notably under heavy I/O load. However the results might not be applicable to real-world scenario due to limitations in hardware and the synthetic stress tests do not correspond to real-world usage. Further research is required to analyze why and how different types of system load affects PTPs accuracy and precision.

(4)

ii

Sammanfattning

Förekomsten av distribuerade realtidssystem kräver protokoll för nog- grann tidssynkronisering mellan enheter. Ett sådant protokoll, Precision Time Protocol (PTP), kan uppnå en precision på under mikrosekunden under synkronisering.

PTP kan implementeras i både hårdvara och mjukvara. Den här rap- porten fokuserar på att analysera hur systembelastning kan påverka precision och noggrannheten hos mjukvaruimplementerad PTP mellan två enheter. Testen utfördes på två stycken Intel Galileo Generation 2 kö- randes Linux. Mjukvara användes sedan för att simulera belastning på olika system såsom CPU, I/O, nätverk och på operativsystemet. Data extraherades ifrån loggar från mjukvaran, vilken sammanfattades i grafer för att sedan analyseras.

Resultaten visade att precisionen och noggrannheten hos PTP för- sämras under vissa typer av systembelastningar, mest märkbart under tung I/O belastning. Resultaten är dock potentiellt inte applicerbara på verklighetscenarion på grund av begränsingar i hårdvaran samt att syn- tetiska stresstest inte motsvarar normal belastning. Ytterligare forskning krävs för att analysera hur och varför olika typer av systembelastning påverkar PTPs precision och noggrannhet.

(5)

1 Introduction 1

1.1 Purpose . . . 2

1.2 Problem Statement . . . 2

1.2.1 Research question . . . 2

1.3 Scope . . . 2

1.4 Outline . . . 2

2 Background 4 2.1 The Precision Time Protocol . . . 4

2.1.1 Best master clock algorithm . . . 5

2.1.2 Synchronization . . . 6

2.1.3 Hardware versus software timestamping . . . 8

2.1.4 The terms accuracy and precision . . . 8

2.2 Related Work . . . 9

2.2.1 Hirschmann . . . 9

2.2.2 Study at Tampere University of Technology . . . 9

2.2.3 Hardware Tests . . . 9

2.3 Testing Environment . . . 10

2.3.1 Hardware . . . 10

2.3.1.1 Intel Galileo . . . 10

2.3.1.2 Storage device for the Intel Galileo . . . 10

2.3.1.3 Wiring . . . 10

2.3.2 Software . . . 11

2.3.2.1 Build system . . . 11

2.3.2.2 Build environment . . . 11

2.3.2.3 The Linux PTP Project . . . 11

2.3.2.4 Stress-ng . . . 12

iii

(6)

iv CONTENTS

3 Method 13

3.1 Preparing the hardware . . . 13

3.1.1 Building the custom Linux image . . . 13

3.2 Testing . . . 14

4 Results 16 4.1 Reading the Charts . . . 16

4.1.1 Calibration . . . 16

4.2 No Load . . . 17

4.3 50% CPU Usage . . . 18

4.4 100% CPU Usage . . . 19

4.5 I/O Stress . . . 20

4.6 Network Stress . . . 21

4.7 OS Stress . . . 22

4.8 Comparison . . . 23

5 Discussion 25 5.1 CPU heavy tests . . . 25

5.2 I/O, OS and Network tests . . . 26

5.3 Reliability and testing environment . . . 26

5.3.1 Synthetic tests . . . 27

5.4 Improvements and future research . . . 27

6 Conclusions 28

References 29

Appendices 31

A Preparing SD Card for Linux 32

B Building a Linux Distribution for the Intel Galileo 33 C Configurations of PTP Implementations 35

D Data Extraction 36

(7)

Introduction

Many modern computer systems are completely reliant on processes being finished on specified real-time deadlines. Whenever missing a real- time deadline causes a total system failure, we call this a hard real-time system[1][2][3]. These types of systems serve a wide variety of functions in many different fields ranging from industrial applications, to comput- erized financial systems[2][1]. The need for hard real-time systems raises the need for accurate time synchronization methods between microcontrollers in distributed systems and there are several different widely used time synchronization protocols available with a range of different accuracies possible[1].

The IEEE 1588 standard defines one of these protocols called PTP, Precision Time Protocol. PTP is used for very accurate time synchronization in the sub microsecond range usually over a local area network[4].

The IEEE 1588 standard specifies a minimal required accuracy for implementations of the protocol as “microsecond to sub-microsecond accuracy and precision”[4]. IEEE 1588 performance has been measured both in real world tests using hardware as well as in software simulations[5][6][7].

This project aims to determine how reliable time synchronization between microcontrollers is using software implemented PTP. This is done using two Intel Galileo development boards running Linux systems, connected with a category 5 ethernet cable, and software simulating system load.

1

(8)

2 CHAPTER 1. INTRODUCTION

1.1 Purpose

Due to the constraints placed upon hard real-time systems, time synchronization must have a strict margin of error or it could cause a total system failure. This study will examine the effects different types of system load could have on the accuracy of software implemented PTP.

1.2 Problem Statement

This report measures how much PTP synchronization accuracy and precision is affected by system load using two Intel Galileo development boards. The goal is to determine if the accuracy and precision of software implemented PTP deteriorate under certain types of system load.

1.2.1 Research question

Does software implemented PTP synchronization accuracy and precision suffer from system load?

1.3 Scope

There exists other protocols such as NTP and GPS to synchronize time between devices, but they will not be tested in this report. The project budget limits the number of PTP-devices in our test, and as such, the tests will only involve two Intel Galileo development boards.

None of the tests in this report will be using hardware accelerated PTP and as such, hardware acceleration of PTP will not be described in depth in the background chapter.

The terms IEEE 1588, PTP, and Precision Time Protocol will all be used interchangeably to refer to the IEEE 1588-2008 standard. This report will not be concerned with the deprecated IEEE 1588-2002 standard.

1.4 Outline

This report is divided into six chapters. This first chapter, Introduction, gives an overview of the project including purpose, problem statement, and scope. The second chapter, Background, gives a brief background of

(9)

the technologies used in creating this report; the PTP protocol as well as any hardware and software used in testing. Select related works are also presented in this chapter. The third chapter, Method, describes how the testing environment was set up and how testing was carried out. In the fourth chapter, Results, the results of the tests are presented. The results are then discussed in the following and fifth chapter, Discussion.

The last and sixth chapter, Conclusions, presents any conclusions drawn.

References can be found following the last chapter, followed by appendices describing technological details of the Background and Method chapters.

(10)

Chapter 2 Background

This chapter gives an introduction to all of the technologies involved in this report. A background to the Precision Time Protocol (PTP), including how it works in practice, is given. After PTP have been introduced, work related to our research question is presented. Finally, the testing environment, including both hardware and software used, is presented.

2.1 The Precision Time Protocol

The aim of the Precision Time protocol (PTP) is to provide a administra- tion free, fault tolerant, sub microsecond accuracy, standardized protocol for synchronizing clocks in distributed systems within a local area network. PTP was designed for environments where using GPS would be too expensive or simply not possible due to connectivity issues. In 2004, it was tested in practice, and achieved an accuracy of roughly 50 ns between the connected devices. PTP is used mostly in industrial tests and automation environments[1]. Compared to other time synchronization protocols, it is still a relatively new and a couple of different implementations currently exist in use. PTP was originally defined in the IEEE 1588-2002 standard in 2002. In 2008 a revised standard, IEEE 1588-2008, was released with minor improvements.

IEEE 1588-2008 defines PTP to use a master-slave hierarchy for clock synchronization in a network. The network elects a grandmaster clock using the best master clock (BMC) algorithm, and all the other devices becomes slaves to that master[4]. There may only be a single grandmaster in a network, but multiple master clocks are allowed. PTP defines three different clock types: ordinary clock, boundary clock and transparent

4

(11)

clock. An ordinary clock has a single PTP port while a boundary clock has multiple PTP ports. The sole purpose of a boundary clock is to convey clock synchronization to multiple ordinary clocks, from a single master clock. The boundary clock also refreshes the timestamp on each packet sent from the clocks master, in comparison to a transparent clock which does not[4].

Figure 2.1: Illustration of a PTP network

2.1.1 Best master clock algorithm

The best master clock algorithm (BMCA) always runs on all boundary and ordinary clocks in the network. Being run constantly, the clock is able to adapt to dynamic changes in the network, e.g. in case of a grandmaster change or partial blackout in the network. BMCA uses data sets consisting of fields from two clocks to compare which clock is

(12)

6 CHAPTER 2. BACKGROUND

more eligible for being a grandmaster clock. Whenever a new potential grandmaster enters the network, all clocks will compare the data set from the old grandmaster with the new candidate. BMCA compares every field one by one, displayed below in decreasing precedence, only comparing the next field if the previous fields in the data set are identical[4].

Priority One A field specified by the user ranging from 0 to 255, where a higher priority takes precedence in the BMCA.

Clock class The type of clock used by the device, ranging from GPS synchronized clock or a atomic cali- brated clock, to a slave only clock. A clock of the highest class shall never be used as anything other than a grandmaster clock[8].

Clock accuracy The accuracy of the clock specified in units of time, ranging from an accuracy of 25ns to an accuracy of less than 10 seconds[4].

Clock variance Static statistics of the variance and stability of the clock.

Priority Two Just like the priority one field, but with lower precedence.

Unique Port ID Often the physical MAC address of the port, this is only used in case of a tie and has no actual relation to the accuracy of the clock.

This way, two different devices comparing the same datasets (device A comparing itself to device B and vice versa) will always come to the same decision of which clock should be the grandmaster.

2.1.2 Synchronization

After a master clock has been chosen, the synchronization may begin.

According to IEEE 1588-2008 the master clock may broadcast the current time as often as 10 times per seconds to all its slaves. A time broadcast usually consists of 3 messages in total, two from the master and one from the slave as shown in figure 2.2.

(13)

Figure 2.2: Synchronization flow

The master starts with sending a Sync message at time T1, and all receiving clocks stores the time T1’ when the message was received. If the master has PTP support in the network hardware, the Sync message will also contain the time T1, otherwise a Follow_Up message will be broadcasted to all slaves, which includes the time T1. Now all slaves have the correct time of the master, and only the delay of the transit time of the packets in the network is left to calculate. The slaves respond to the master with a Delay_Req message, timestamping the time sent as T2.

The master responds to all requests with a Delay_Res packet, containing the time the master received the delay request packet, T2’. All slaves are now able to calculate the transit time delay, the time it takes for a message travel the distance from the master to a slave, and its own offset relative to its master. The time offset is defined as the difference between a slave’s clock and its master’s clock. Let d be the transit time, and ω be the offset between the two clocks. Since we assume the transit time

(14)

is constant[4], we may calculate ω as follows.

ω + d = T 1⁰− T 1

−ω + d = T 2⁰− T 2

ω = T 1⁰− T 1 − T 2⁰+ T 2 2

The slave calculates its offset relative to master clock, and then adjusts its own clock accordingly. Since the synchronization may happen a couple of times per second, clock wander or jitter is assumed to be negligible[4].

2.1.3 Hardware versus software timestamping

Application Layer

Physical Communication Path

Linux Kernel Network Stack Physical Layer

Hardware Timestamping

Software Timestamping

Figure 2.3: Methods of timestamping

There exists multiple ways of timestamping in PTP. In a scenario where software timestamping is used, the time spent in the Linux kernel network stack is not accounted for, as shown in figure 2.3, but instead added to the offset between the master and slave. This increases the offset by 2 orders of magnitude[6]. In this report, only software implemented timestamping will be considered.

2.1.4 The terms accuracy and precision

The terms accuracy and precision are used extensively throughout this report. Wikipedia defines these two terms as “Accuracy is the proximity

(15)

of measurement results to the true value; precision, the repeatability, or reproducibility of the measurement”[9]. In our report, the word accuracy describes how close to zero the synchronization offset between a master and slave is. Precision refers to how consistent the running average synchronization offset is. In the context of a PTP implementation, the word performance refers to both accuracy and precision.

2.2 Related Work

This section presents work related to our research question “Does software implemented PTP synchronization accuracy suffer from system load?” presented in chapter 1, Introduction.

2.2.1 Hirschmann

Hirschmann, a German company specializing in industrial networking has released a white paper providing “an overview of the application possibilities and function of the Precision Time Protocol”[6]. The paper presents PTP accuracies of “about 5 to 50 µs as a pure software solution”.

2.2.2 Study at Tampere University of Technology

A study from The Institute of Digital and Computer Systems at Tam- pere University of Technology in Finland analyzed the results of software implemented PTP over a Wireless Local Area Network (WLAN) which also included results using a direct ethernet connection between two laptops. The tests presented in the study shows an average PTP accuracy of 7 µs between the two laptops with a variance of 10.8 µs²[7].

2.2.3 Hardware Tests

Professor Hans Weibel at Zurich University of Applied Sciences in Switzer- land published an article in the Embedded World journal in 2005 ana- lyzing accuracies using hardware implemented PTP. The study shows sub 100 nanosecond accuracies with “the significant variation [...] within

±80 ns”[5].

(16)

2.3 Testing Environment

This section presents information regarding the testing environment used during the creation of this report. All software, hardware, and configuration needed is presented. Wherever needed, additional details found in appendices will be referred to.

2.3.1 Hardware

This section contains short descriptions about all hardware used in this study.

2.3.1.1 Intel Galileo

The Intel Galileo (from here on referred to as Galileo) is a development board designed by Intel Corporation. Developed for use in “Internet of Things” applications, the Galileo is equipped with a Intel Quark X1000 CPU[10]. The Quark X1000 is a 32-bit x86 system-on-a-chip CPU running at 400 MHz. The Galileo is fitted with 256 MB RAM. Both of the Galileos used in testing ran on official Intel Galileo firmware version 1.1.0.

2.3.1.2 Storage device for the Intel Galileo

The Intel Galileo uses a microSD card as its storage device. The microSD card should be formatted with a single bootable primary partition span- ning the whole device, starting at sector 1024. The disk is to be formatted as a Master Boot Record (MBR) disk with a single FAT32 partition. On most Linux distributions including Ubuntu 16.04, a tool called fdisk may be used for partitioning the SD card. For more information see appendix A.

2.3.1.3 Wiring

To connect the Galileo boards together, short distance category 5 ethernet cables are used. To monitor and connect to the devices, TTL 232R UART to USB cables are used.

(17)

2.3.2 Software

First in this section is a short introduction of the build system and build environment. Then, the software used in the testing is presented.

2.3.2.1 Build system

The build system used for this project is The Yocto Project[11]. The Yocto Project describes itself as “an open source collaboration project that provides templates, tools and methods to help you create custom Linux-based systems for embedded products”[11].

The build engine in the Yocto project is called Bitbake. Bitbake is an asynchronous generic task executor used for handling dependencies and cross compilation in the Yocto project[12]. Bitbake prepares and assembles a cross compiler toolchain to compile the custom Linux distribution used in the testing of PTP. The cross-compiler collection used in this project was GCC¹[13], since the Linux kernel is tied to GCC due to usage of GCC specific extensions to the C programming language.

2.3.2.2 Build environment

The build environment chosen was Ubuntu Linux 16.04[14], as the Yocto project officially support Ubuntu 16.04[11]. Ubuntu 16.04 is a stable long time support distribution, and still receives hardware and maintenance updates[14]. No issues were encountered with compiling the custom distribution with Ubuntu 16.04, and thus Ubuntu 16.04 was used as build environment throughout the testing. Ubuntu 16.04 was run inside a virtual machine for flexibility and for better encapsulation from unrelated libraries and other binaries which might interfere with the build system.

Both GNOME Boxes[15] and VirtualBox[16] were tested as virtual machines for the project. Both were found to be capable enough for use throughout this entire project.

2.3.2.3 The Linux PTP Project

The Linux PTP Project (linuxptp) [17] is a free and open source implementation of the IEEE 1588 standard for Linux systems. One of the most used implementations for Linux systems, the Linux PTP Project is officially supported by Intel Corporation. The design goals of linuxptp

1GNU Compiler Collection

(18)

are “to provide a robust implementation of the standard and to use the most relevant and modern Application Programming Interfaces (API) offered by the Linux kernel”[17]. The Linux PTP Project will be used throughout the testing as the reference implementation of PTP.

2.3.2.4 Stress-ng

Stress-ng is a stress testing program used to stress a computer in different ways[18]. Stress-ng features stress testing different computer subsystems separately. These subsystem specific tests can test against, but not limited to, I/O, OS, Network and CPU. The tests of linuxptp will involve stressing these subsystems one at a time, while measuring the accuracy and precision of linuxptp. Linuxptp was compiled from source using the same GCC toolchain used by bitbake when compiling in the build environment. Stress-ng was as chosen since due to its ease of use and the possibility to create a static executable.

(19)

Method

In this chapter all the steps taken to set up a testing environment as well as carrying out the tests are described in detail. First, instructions for building and booting the Linux image as well as configuring the software used are presented followed by the steps taken to carry out the tests described in detail.

In this chapter, whenever possible, references to other sources as well as to appendices are given for more in-depth technological detail.

3.1 Preparing the hardware

The first step in setting up the testing environment was updating the Galileo firmware using the official Intel Galileo Firmware Updater[19].

The Firmware Updater software was run on a Windows 10 PC with the Galileos connected directly over USB. Both of the Galileos were updated to version 1.1.0.

3.1.1 Building the custom Linux image

The process of setting up the environment for Bitbake and the steps necessary to compile the modified Linux kernel is described in appendix B.2. The compiled kernel, root filesystem and related files were subse- quently copied onto a 8GB FAT32 formatted microSD card as described in appendix A.

13

(20)

14 CHAPTER 3. METHOD

3.2 Testing

The testing started when the custom Linux image with Linux PTP installed on, was built. One of the Galileo boards were the designated slave, and the other were the designated master. The master and slave desig- nations did not change throughout the testing. Unnecessary processes running on the Galileo systems were killed to prevent irrelevant strain on the Linux kernel. The master was set up to broadcast Sync-messages at an interval of 0.125 seconds between each message, for a total of eight times a second instead of the default of one Sync per second. IEEE 1588 states that up to ten broadcasts may happen per second, thus eight broadcasts per second was chosen to be able to gather data at an higher resolution without pushing the boundaries of the protocol.

A test consisted of connecting the two Galileo boards together through an ethernet cable, and simultaneously starting both stress-ng and linuxptp on both devices at roughly the same time. The linuxptp configuration and command line arguments remained unmodified for the duration of the testing, and can be found in appendix C. linuxptp logs the offset between the master and slave for every Sync-message sent. This out- put is stored and logged, from which the offset data is then extracted as described in appendix D.

An initial testrun was performed, synchronizing the two Galileos together. The first testrun was performed without stress-ng running. The test ran for 45 minutes, recording more than 20 000 Sync messages. The results from the initial testrun were consistent enough that the length of all subsequent tests were limited to a maximum of 15 minutes.

After that, a total of six different tests were performed. Each of the six different tests all had a specific purpose of stress testing a single com- ponent of the device. The aim of a stress test is to measure how linuxptp performs under circumstances where a subsystems resources are fully ex- hausted. This is to provide a general overview of which subsystems of the device affects the accuracy and precision of linuxptp the most.

No load The first test was a test without stress-ng running, to provide a baseline of what is the average offset between the master and the slave in the context of software based timestamping.

50% CPU load Stress-ng was used to simulate a constant CPU usage of 50% in the second test. Only a single

(21)

CPU-heavy process was run.

100% CPU load The third test was much like the second test, but with full load on the CPU for the duration of the test.

Stress I/O The fourth test involved stress testing the Galileo boards’ I/O functionality together with the I/O scheduler of the Linux kernel. Since linuxptp is mostly I/O based, this test was to test if heavy I/O load would affect the precision or accuracy of linuxptp.

Stress Network After the I/O test, a test stressing only the network itself was performed. The meaning of this test was to stress the Linux kernel network stack to determine how big of a role the Linux kernel network stack performed in the accuracy and precision of linuxptp.

Stress the OS The last test consisted of only testing the Linux kernel, through various system call and virtual memory management stress tests.

The time offsets reported by linuxptp was recorded and logged, the results are presented in the next chapter.

(22)

Chapter 4 Results

In this chapter, results from the tests described in the previous chapter, Method, are presented. Results are presented using scatter plots and line charts with complementing explanatory text. The first section gives an introduction on how to interpret the graphs presented in the chapter.

Following that, results from the tests are presented in the following order:

no load, 50% CPU load, 100% CPU load, I/O stress, network stress.

Following the charts for all the tests, charts for comparison between tests are provided. For analysis of the results and commentary, please refer to the following chapter, Discussion.

4.1 Reading the Charts

All charts in this chapter display the PTP offset in nanoseconds on the y-axes and time passed in seconds on the x-axes. Y-axes are all base 10 logarithmic scales whereas x-axes are linear. All dots in the scatter plot represent one measurement of offset at one moment in time. Offset is defined as the time difference between a slave’s clock and its master’s clock. There are also line charts integrated into the scatter plots which are created using Gnuplot’s smooth bezier option to create a more hu- manly readable average. In the case of the comparion charts, only the line chart will be provided.

4.1.1 Calibration

Worth noting is that all charts start out with a very high offset that stabilizes rapidly after approximately 20 to 30 seconds. This is due to

16

(23)

PTP calibration and is not representative of actual performance.

4.2 No Load

1 10 100 1000 10000 100000 1x10⁶

0 100 200 300 400 500

Oﬀset (ns)

Time (s)

No load

Figure 4.1: No Load Offset

Figure 4.1 presents the data captured during testing with no system load. With no system load, we can see that performance is relatively stable after initial calibration is finished. It is clear that most values stay within a 100 − 150, 000 nanosecond range.

(24)

18 CHAPTER 4. RESULTS

4.3 50% CPU Usage

1 10 100 1000 10000 100000 1x10⁶

0 100 200 300 400 500

Oﬀset (ns)

Time (s)

50% CPU

Figure 4.2: 50% CPU Usage Offset

Figure 4.2 presents the data captured during testing with 50% CPU usage. With 50% CPU usage, we can see that offset is a bit lower than with no load with the exception of the unusually high offset during calibration.

Most values stay within a 100 − 125, 000 nanosecond range.

(25)

4.4 100% CPU Usage

1 10 100 1000 10000 100000 1x10⁶

0 100 200 300 400 500

Oﬀset (ns)

Time (s)

100% CPU

Figure 4.3: 100% CPU Usage Offset

Figure 4.3 presents the data captured during testing with 100% CPU usage. Offset values, while not as stable as during no load, still mostly fit within the 100 − 150, 000 range seen in figure 4.1 with no load.

(26)

4.5 I/O Stress

1 10 100 1000 10000 100000 1x10⁶ 1x10⁷

0 100 200 300 400 500

Oﬀset (ns)

Time (s)

I/O Stress

Figure 4.4: I/O Stress Offset

Figure 4.4 presents the data captured during testing with I/O stress. It is clear that I/O stress gives very unstable offsets with values over 100, 000 being the norm with four very short dips in offset value. We can also see how between seconds 330 to 360, offset stays over the 10⁶ line for approximately 40 seconds. This happens once again at the 400 second mark but this time only for approximately 20 seconds.

(27)

4.6 Network Stress

1 100 10000 1x10⁶ 1x10⁸ 1x10¹⁰ 1x10¹²

0 100 200 300 400 500

Oﬀset (ns)

Time (s)

Network Stress

Figure 4.5: Network Stress Offset

Figure 4.5 presents the data captured during testing with network stress.

Values are less stable than with no load. There are also exceptionally high offsets during calibration. Most offset values are clearly within the 100 − 150, 000 range, just like in figure 4.1 containing data for no load.

However, we can see that the values fluctuate a lot more during network stress.

(28)

4.7 OS Stress

1 10 100 1000 10000 100000 1x10⁶ 1x10⁷ 1x10⁸

0 100 200 300 400 500

Oﬀset (ns)

Time (s)

OS Stress

Figure 4.6: OS Stress Offset

Figure 4.6 presents the data captured during testing with network stress.

With the exception of a relatively stable period of increasing offset values from seconds 100 to 270, values are very unstable.

(29)

4.8 Comparison

1 10 100 1000 10000 100000 1x10⁶

0 100 200 300 400 500

Oﬀset (ns)

Time (s)

no load 100% cpu 50% cpu

Figure 4.7: No Load / CPU Usage Comparison

Figure 4.7 presents the data captured during testing with no load, and CPU usage at 50% and 100%. With the exception of the extremely high offset values during calibration for the test with 50% CPU usage, the tests with CPU usage perform slightly better than the test with no load.

For the individual results of each test presented in this chart, please refer to figures 4.1, 4.2, and 4.3.

(30)

1 10 100 1000 10000 100000 1x10⁶ 1x10⁷

0 100 200 300 400 500

Oﬀset (ns)

Time (s)

no load stress io stress network stress os

Figure 4.8: No Load / IO / Network / OS Comparison

Figure 4.8 presents the data captured during testing with no load, I/O stress, network stress, and OS stress. We can clearly see that the only stable results in this chart comes from the test with no load. All other lines have fluctuations ranging from ± 6, 000 nanoseconds to ± 500, 000 nanoseconds. For the individual results of each test presented in this chart, please refer to figures 4.1, 4.4, 4.5, and 4.6.

(31)

Discussion

In this chapter, analysis and discussion related to the results presented in the previous chapter, Results is given. First the results from the different tests are discussed and analyzed. The chapter ends with a discussion on the reliability of the results, and various improvements for future research. No charts will be presented in this chapter. For charts and results, please refer to the previous chapter.

5.1 CPU heavy tests

The CPU heavy tests showed that in general, linuxptp was able to perform more consistently with higher precision under no processor load, than under processor stress tests. In comparisons to the other tests, stressing the CPU decreased the average synchronization offset between the master and the slave. The results presented in figure 4.7 showed a more accurate but less precise average offset compared to the no load test. This was not expected, and one possible explanation is how the Linux kernel process scheduler works. By context switching directly to linuxptp from stress-ng, the kernel removes the need to sleep and wake up the processor. This removes some of the start up cost from a context switch which linuxptp is able to profit from. Another explanation is processor affinity or scaling of the CPU clock frequency. The Linux kernel may scale the processors’ clock frequency down in order to save power if the processor is idling a lot. This hypothesis assumes that in the test of no load, the Linux kernel scaled the CPU frequency down to save power, which in turn affected the accuracy of linuxptp. However this can not be the answer since the Intel Quark X1000 a single core

25

(32)

26 CHAPTER 5. DISCUSSION

CPU, and is always running with a fixed clock frequency of 400MHz. A full explanation requires further testing of the CPU and linuxptp which is outside the scope of this thesis.

5.2 I/O, OS and Network tests

The visible dips in the I/O chart corresponds to when stress-ng changes stress test. Figure 4.8 shows that linuxptp performs terribly when the I/O subsystem is under heavy load. During heavy I/O load the peak offsets is as high as 1 millisecond, approximately 125 times worse than worst average offset during no load. From around 60 seconds to 280 seconds the I/O tests heavily stress the virtual memory of the Galileo board, as writing and reading to virtual memory is tested by stress-ng. The offsets during the I/O test aligns with the offsets in the OS test during this period. This is because during this period in the OS test, stress-ng also stresses the virtual memory through various system calls involving memory management. The correlation shows that the virtual memory may form a bottleneck when under heavy load. Stress-ng switches from testing the virtual memory in the OS test (after 300 seconds) to testing other various system calls, linuxptp is able to slowly recover and lower the offset over the duration for the last 200 seconds.

The results in figure 4.8 also show that when increasing the load on the network, the precision of linuxptp suffers. This shows that the Linux kernel network stack affects software timestamping precision to a high degree, as expected.

5.3 Reliability and testing environment

The testing environment is far from ideal. The Galileo boards are low performance, single CPU core development boards. As such, the Linux kernel can not separate computationally intensive processes (stress-ng) from high priority processes (linuxptp) by placing them on different cores.

Running these tests on a multi-core CPU may produce vastly different results. The PTP network set up in these tests are also minimal, consisting of only two devices. A PTP network in the real world scenario often involves multiple masters and slaves in a tiered configuration as shown in figure 2.1.

(33)

5.3.1 Synthetic tests

The stress tests from stress-ng does not correspond to real life scenarios.

The synthetic tests have been designed to pressure the system to work at peak performance. Stress-ng produce far too predictable and consistent load on the different subsystems of the computer to be considered equal to real world load on the system. The tests however was not designed to be as close to a real world scenarios as possible, but only to show how linuxptp performs under heavy load on different subsystems.

5.4 Improvements and future research

Several improvements can be made in future testing. For instance, recording which stress test stress-ng is currently running may provide additional explanations of the charts and the synchronization performance of linuxptp. Redoing the tests on a multi core CPU might also provide beneficial information in how big of a role the kernel process scheduler plays in the performance of linuxptp. Results from testing in a multi-core setup has also the possibility of testing the difference between a single vs multiple resource intensive processes per core which may also provide additional information on the performance of software accelerated ptp implementations. Changing the size of the PTP network may also provide additional information on how system load affects the master clock separately from the slaves, which was not tested in this thesis due to hardware constraints.

One additional improvement can be to test these results against a PTP network with hardware timestamping capabilities, to give more decisive results on how the Linux kernel network stack affects the linuxptp under heavy load.

(34)

Chapter 6 Conclusions

Software implemented PTP is shown to stay within a 1 microsecond to 10 microsecond range when the system is under no load, which is consistent with the studies presented under Related Work in chapter 2.

Using software implemented PTP during heavy CPU usage is shown to improve accuracy slightly compared to the results from the test without any stress on the CPU. The reason for this increase in accuracy could be an effect of the design of the Linux kernel, however further research is needed to verify this.

When the device is under heavy I/O or OS stress, the performance of PTP suffer greatly. This degradation is probably due to a bottleneck forming on the device or in the Linux kernel. Further research is needed to verify the reason behind the degradation.

During heavy network stress, software implemented PTP is shown to occasionally have increased accuracy at the cost of precision. This shows that the Linux kernel network stack affects software timestamping, however the cause for this is not analyzed in this report and requires further research.

These results brings us to the conclusion that the performance of software implemented PTP does degrade under certain types of system load, in this specific testing environment. Additional research is needed to determine the reasons behind this degradation of accuracy and precision, however the research question stated has been answered.

Finally, the results might not be applicable to real-world usage of PTP. The tests were carried out using a minimal hardware setup and the stress put on the system was only a simulation, and not equivalent to real-world usage.

28

(35)

[1] J. C. Eidson, Measurement, Control, and Communication Using IEEE 1588. Springer-Verlag London, 2006, isbn: 978-1-84628-251- 5. doi: 10.1007/1-84628-251-9.

[2] J. A. Stankovic, “Real-time computing”, Apr. 1992.

[3] G. Buttazzo, Hard Real-Time Computing Systems, 3rd ed. Springer US, 2011.

[4] “IEEE standard for a precision clock synchronization protocol for networked measurement and control systems”, IEEE Std 1588-2008 (Revision of IEEE Std 1588-2002), pp. 1–269, Jul. 2008. doi: 10.

1109/IEEESTD.2008.4579760.

[5] H. Weibel, “High precision clock synchronization according to ieee 1588 implementation and performance issues”, Embedded World, vol. 22-24, p. 96, Feb. 2005.

[6] A. Dreher and D. Mohl, “Precision clock synchronization the standard ieee 1588”, Hirschmann Automation and Control GmbH, Neckarten- zlingen, Germany, Tech. Rep., Rev 1.2.

[7] J. Kannisto, T. Vanhatupa, M. Hännikäinen, and T. D. Hämäläi- nen, “Precision time protocol prototype on wireless lan”, in Inter- national Conference on Telecommunications, Springer, 2004, pp. 1236–

1245.

[8] G. M. Garner, “Use of IEEE 1588 Best Master Clock Algorithm in IEEE 802.1AS”, Technical Presentation, 2008.

[9] Accuracy and precision, WikiMedia Foundation. [Online]. Avail- able: https://en.wikipedia.org/wiki/Accuracy_and_precision (visited on 05/05/2017).

29

(36)

30 REFERENCES

[10] Intel Quark SoC X1000 Specifications, Intel Corporation. [Online].

Available: https : / / ark . intel . com / products / 79084 / Intel - Quark-SoC-X1000-16K-Cache-400-MHz (visited on 05/05/2017).

[11] Yocto Project. [Online]. Available: https://www.yoctoproject.

org/ (visited on 05/05/2017).

[12] Bitbake User Manual, Yocto Project. [Online]. Available: https : //www.yoctoproject.org/docs/1.6/bitbake- user- manual/

bitbake-user-manual.html (visited on 05/05/2017).

[13] GCC, the GNU Compiler Collection, Free Software Foundation.

[Online]. Available: https://gcc.gnu.org/ (visited on 05/05/2017).

[14] The leading operating system for PCs, IoT devices, servers and the cloud | Ubuntu, Canonical Ltd. [Online]. Available: https://www.

ubuntu.com/ (visited on 05/05/2017).

[15] Boxes, GNOME Foundation. [Online]. Available: https://wiki.

gnome.org/Apps/Boxes (visited on 05/05/2017).

[16] Oracle VM VirtualBox, Oracle Corporation. [Online]. Available:

https://www.virtualbox.org/ (visited on 05/05/2017).

[17] The Linux PTP Project. [Online]. Available: http://linuxptp.

sourceforge.net/ (visited on 05/05/2017).

[18] Stress-ng. [Online]. Available: http://kernel.ubuntu.com/~cking/

stress-ng/ (visited on 05/05/2017).

[19] Intel Galileo Firmware Updater and Drivers. [Online]. Available:

https://downloadcenter.intel.com/download/26417/Intel- Galileo-Firmware-Updater-and-Drivers (visited on 05/05/2017).

[20] Intel Quark SoC X1000 Board Support Package (BSP), Intel Cor- poration. [Online]. Available: https://downloadcenter.intel.

com/download/23197/Intel-Quark-SoC-X1000-Board-Support- Package-BSP- (visited on 05/05/2017).

(37)

31

(38)

Appendix A

Preparing SD Card for Linux

To format a microSD card for booting Linux, the sfdisk may be used, which is installed by default on Ubuntu 16.04. Make sure to substitute

<DEVICE> for the device name in your Linux operating system (such as sdb). This script assumes an SD-card with 8GB of storage, however size may be adjusted accordingly.

$ sudo sfdisk /dev/<DEVICE> << EOF label: dos

label-id: 0x3af14fc4 device: /dev/<DEVICE>

unit: sectors

/dev/<DEVICE>1 : start= 2048, size= 15351808, type=83 EOF

After the SD card has been prepared and the Linux distribution compiled, the necessary files must be copied over. The files in the boot-folder and all its contents, bzImage which is the kernel itself, grub.efi, image-full- quark.ext3 which is the root filesystem, and core-image-minimal- initramfs-quark.cpio.gz which is the initial ramdisk used by the kernel during boot should all be copied over to the SD card.

$ find . -type f ./boot/grub/grub.conf ./bzImage

./grub.efi

./image-full-quark.ext3

./core-image-minimal-initramfs-quark.cpio.gz

32

(39)

Building a Linux Distribution for the Intel Galileo

B.1 Intel Quark Board Support Package

The Intel Quark Board Support Package (BSP) archive[20], provided by Intel, contains sources and utilities for building a base distribution for the Intel Galileo. In this project version 1.1.0 of the BSP archive was used, since it was the latest archive containing all the required scripts and the correct configuration.

B.2 Building steps

Lines starting with “$” are considered commands for your favourite shell.

Lines starting with “#” are considered comments.

$ wget http://downloadmirror.intel.com/23823/eng/\

bsp_sources_and_docs_for_intel_quark_v1.1.0.zip

$ unzip bsp*.zip

$ 7z e *.7z

$ tar xvf meta-clanton_v1.1.0-dirty.tar.gz

$ cd meta-clanton_v1.1.0-dirty

$ ./setup.sh -e meta-clanton-bsp

$ source iot-devkit-init-build-env build

# This lets you configure the kernel however you want

$ bitbake virtual/kernel -c menuconfig

# This builds and compiles everything, may take hours

33

(40)

34 APPENDIX B. BUILDING A LINUX DISTRIBUTION FOR THE INTEL GALILEO

$ bitbake image-full

The kernel and related files will all be present in ./tmp/deploy/images/quark/.

(41)

Configurations of PTP Implemen- tations

C.1 Linux PTP Project

linuxptp was run with a niceness of -20, to indicate to the Linux kernel the importance of the process. This also makes the Linux kernel prioritize CPU time allocation to prefer linuxptp (ptp4l), during heavy load on the system.

C.1.1 ptp.conf

[global]

logSyncInterval -3 summary_interval -3

C.1.2 Command Line Options

nice --20 ptp4l -Smql 6 -f ptp.config \ -i enp0s20f6 2>&1 > /media/mmcblk0p1/ptp.log

35

(42)

Appendix D

Data Extraction

D.1 Linux PTP Project

An AWK script is used to extract time, offset, and delay from logfile.

The script also subtracts initial offset value from all offsets to make time start from 0:

# Get offset from first line {if (NR == 1) {

match($1, /[0-9]+\.[0-9]+/, a);

shift = a[0]

}}

# Extract time with regex to ’a’

{match($1, /[0-9]+\.[0-9]+/, a)}

# Print time, offset, and delay unless line contains words

#"UNCALIBRATED" or "selected" for when PTP selects a new master

# Subtract shift from offset to start from time 0

!/UNCALIBRATED|selected/ {print a[0] - shift "\t" $4 "\t" $10}

To convert all numbers to their absolute value, all data was piped through the following command:

sed ’s/-//g’

36

(43)