Timing and Synchronization over Ethernet

(1)

Final thesis

Timing and Synchronization over

Ethernet

by

Emil Lundqvist

LiTH-ISY-EX--15/4824--SE

(2)

(3)

Final thesis

Timing and Synchronization over

Ethernet

by

Emil Lundqvist

LiTH-ISY-EX--15/4824--SE

Februari 20, 2015

Supervisor, ISY: Andreas Ehliar

Supervisor, : Victor

(4)

(5)

Abstract

§ In this thesis an investigation will be done on how time and frequency can be synchronized over Ethernet with help of Precision Time Protocol and Synchronous Ethernet. The goal is to achieve a high accuracy in the synchronization when a topology of 10 cascaded nodes is used. Different approaches may be used when implementing Precision Time Protocol for synchronization. They will be investigated and the best approach for a good accuracy will be proposed. Another question that this thesis will cover is how to recover a radio frequency, a multiple of 3.84 MHz from Ethernets 10.3125 GHz.

By using hardware support for the timestamps and transparent clocks in the forwarding nodes the best accuracy is achieved for the time and phase synchronization. Combining this with Synchronous Ethernet for frequency synchronization, to get a traceable clock through the system, will lead to the best result. The total error does not need to be greater than 1.46 ns if the asymmetry in the medium is neglected and a well designed PCS and FIFO are used. Recovering the radio frequency from Ethernet is done by using the highest common frequency, either an integer phase locked loop or a fractional phase locked loop can be used. The fractional phase locked loop will give a better result but will contribute with spurs that the integer phase locked loop does not.

(6)

(7)

Acknowledgements

First of all I want to thank my supervisor Victor at the company for all the guidelines and help during the thesis. I also want to thank all experts at the company that my supervisor put me in contact with. Especially Andres and Stefan that was very useful to discussions the problem I encountered during the thesis. I also want to thank my boss, Pierre, which gave me the opportunity to do my thesis at the company.

From the University I want to thank my fellow students for the good time as a student. I want to give an extra big thanks to Viktor Classon for reviewing my report and act as my opponent for the theses. I also want to thank my examiner Olle Seger and my supervisor Andreas Ehliar for the help during this thesis and making it possible to accomplish.

Most of all I want to thank my parents Lars and Maria for supporting me during my time of studies and my partner Ellen Selling that have been very supporting during the thesis. At last I also want to thank my friends and previous neighbor, Alexander and Ruby Peck for reviewing the report.

Emil Lundqvist

(8)

(9)

List of Figures

1.1 Shows how a tree topology is estimated with a chain topology

by choosing the longest path . . . 3

1.2 A chain topology with 4 nodes and 3 hops . . . 3

2.1 An overview of 10GBASE-R Ethernet, an explanation for the abbreviations can be found in Table 2.2 . . . 6

2.2 An Ethernet packet with the following IPG . . . 7

2.3 A block diagram over the PCS . . . 7

2.4 FIFO with N positions and 64 bits in each position . . . 8

2.5 An example how a serial scrambler can be implemented, where the operators are XOR-gates . . . 9

2.6 An example how a serial descrambler can be implemented, where the operators are XOR-gates . . . 10

2.7 A two-step synchronization in PTP . . . 11

2.8 Illustration of a boundary clock . . . 12

2.9 Illustration of a transparent clock . . . 13

2.10 Three nodes implemented with Synchronous Ethernet . . . . 13

3.1 A simple system of two nodes . . . 15

3.2 Different choices for the PTPs reference point . . . 16

3.3 Showing the asymmetry in a node . . . 17

3.4 An example where data1 writes to the FIFO with clock1and data2 reads from the FIFO with clock2 . . . 18

3.5 The chart shows the variable delay in the FIFO for different bit widths between PCS and PMA . . . 21

3.6 Show a network where a boundary clock is useful to reduce the workload from the grandmaster . . . 23

3.7 The time estimation in a transparent clock . . . 24

3.8 The difference between End-to-End (solid line) and Peer-to-Peer (dotted line) delay estimation . . . 25

3.9 Network load for different PTP message . . . 27

4.1 Block diagram over an integer PLL . . . 30

4.2 Phase noise graph for an integer PLL . . . 30

(12)

4.4 Phase noise graph for a fractional PLL . . . 31 4.5 Phase noise graph for a fractional PLL with reduced spurs . 32

(13)

List of Tables

2.1 The different layers in the OSI Model . . . 5 2.2 An explanation for the abbreviations used in Figure 2.1 . . . 6

(14)

(15)

Chapter 1

Introduction

1.1 About the work

This document is a master thesis of a student at Link¨opings University. The work is the last step in the master program Applied Physics and Electrical Engineering, system on chip. The work will give the student 30hp of the 120hp that a master degree contains, this 30hp shall correspond to 20 weeks of studies. The student will be graduated at the Department of Electrical Engineering in Link¨oping and the work will be done at .

The necessary knowledge for this work is to understand the different communication protocols that are used and how the timing will be affected. In this section a short introduction of these protocols will be done.

1.1.1 Communication protocols

Ethernet is one of the most widely used data communication standards in the world. The standard was published in 1985 at the Institute of Electrical and Electronics Engineers (IEEE) and is defined as IEEE 802.3 [2]. The communication standard is asynchronous, based on data packets and will be discussed more in Section 2.1. In this thesis the 10GBASE-R standard will be used.

Precision time protocol (PTP) is a protocol that is designed to synchronize real time clocks over a network, such as Ethernet. The first published version was released 2002 and the second (and latest) version in 2008. This protocol is defined as IEEE 1588 [3]. In Section 2.2 a more detailed overview of the protocol will be explained.

Synchronous Ethernet (SyncE) is a recommendation from International Telegraph Unions Telecommunication Standardization Sector (ITU-T) on how a network can be setup to get a good frequency synchronization. In Section 2.3 SyncE will be discussed more.

(16)

CHAPTER 1. INTRODUCTION

1.2 Presentation of the problem

There are two different tasks which will be investigated in this thesis, the first one is regarding time and synchronization and the second one is how to achieve a required frequency. This will be done by using 10GBASE-R Ethernet as the communication protocol between different nodes.

The first task regarding the time and synchronization is how to get a system with many nodes to have the same perception of time. There is a known way to distribute time over Ethernet called PTP, but it can be implemented in different ways which gives it different properties. The focus will be to get a good accuracy combined with a relatively low cost. A recommended solution will be presented and an estimation of the time accuracy will be done.

The frequency issue is how a telecommunication frequency can be recovered from Ethernet. The transmitting frequency for Ethernet is 10.3125 GHz, when 10GBASE-R is used, while the wanted radio frequency is a multiple of 3.84 MHz which the 10GBASE-R frequency is not.

1.3 Restrictions

1.3.1 Topologies

The only topology that will be investigated is the chain topology with cascaded nodes. Another topology that could be of interest is the tree topology, but for timing the worst case is the longest path. This can be estimated in a single chain topology with the same number of nodes as the longest path. An example of that can be seen in Figure 1.1 where the solid line in Figure 1.1a is the longest path and can be represented with the chain topology in Figure 1.1b.

(17)

CHAPTER 1. INTRODUCTION M S2 S1 S3 S4 S7 S5 S6

(a) Tree topology

S1

S4

S7 M

(b) Chain topology

Figure 1.1: Shows how a tree topology is estimated with a chain topology by choosing the longest path

1.3.2 The system

The system that will be used is one master node with a number of slave nodes that shall be synchronized to the master. The slaves will be in a chain topology where each node has a maximum of two logical connections, one upstream and one downstream. The maximum number of nodes is 10. 1 master node and 9 slave nodes which gives a maximum of 9 hops. A smaller system of 4 nodes can be seen in Figure 1.2 as an illustration of the used expressions.

Master Slave Slave

hop node

Slave

Figure 1.2: A chain topology with 4 nodes and 3 hops

1.3.3 Time error budget

A network is commonly built with different components from different manufacturers and the system have certain requirements of the final network that shall be fulfilled. Therefore it is beneficial to divide the

(18)

CHAPTER 1. INTRODUCTION

requirements between different parts of the final network and give each part their own requirement or budget to follow. This thesis contains a system with cascaded nodes and the focus will be on the nodes and not on the link between them. The time error from the cable is in another time budget that is not covered to the same extent in this thesis. Therefore it will be mentioned but not investigated in the same way as the nodes.

1.4 Related research

There is a lot of studies handling timing and synchronization over a network or any of the protocols that are used in this thesis. All the articles are not relevant for this thesis but some of the articles that are more related to the subject are mentioned below.

For example there is one study handling how the different implementation methods will behave in a highly cascaded network [14]. Another study describe how a transparent clock can be implemented [13]. In this study there is also a discussion about different sources of the time error.

There is also an article that describes how the boundary clocks can be used in telecom networks. In the article there is a discussion of how SyncE can be used together with PTP [15].

Beside the articles, there is also a book that can be used for some basic understanding. This book was unfortunately written before the second version of the PTP was released [12]. The book handle many of the features that was released in the second version of PTP but does not go into all the details.

1.5 Outline

An introduction of the thesis and its restriction is given in Chapter 1. In Chapter 2 the background with the different protocols are explained. Some expression that will be used during the thesis are also explained there. Chapter 3 will handle the time problem, where the first section will explain the problem while the second section will discuss different solution. In Chapter 4 the different frequencies will be discussed, regarding what the problem is and what a possible solution can look like. Chapter 5 will go through the result of the thesis and summarize how timing and synchronization can be transfered over Ethernet with the best solution. A solution that contain the best possible accuracy. The last chapter, Chapter 6, will give some conclusion and a comparison with measurements from another system.

(19)

Chapter 2

Background

2.1 Ethernet

Ethernet is a family of standards for communication over a physical media in computer networks. It is the most common technology in local area networks (LAN) and the working group IEEE 802.3 have released many standards since the first one which was released in 1982.

In this thesis IEEE 802.3-2012 will be used, this standard describes the physical layer (PHY) [9] and the data link layer media access control (MAC) [8]. According to the open systems interconnection model (OSI model) [6] these will be found in layer 1 and layer 2. In Table 2.1 all the different layers in the OSI model can be found.

Layer number Name

7 Application Layer

6 Presentation Layer

5 Session Layer

4 Transport Layer

3 Network Layer

2 Data Link Layer

1 Physical Layer

Table 2.1: The different layers in the OSI Model

It is only the 10GBASE-R that are of interest in this documentation, Figure 2.1 gives an overview of the system. The most important block is the physical coding sublayer (PCS), this block will be described in more detail in Section 2.1.2.

(20)

CHAPTER 2. BACKGROUND MAC Reconciliation PCS Serial PMA PMD Medium MDI XGMII Physical layer Data link layer

Figure 2.1: An overview of 10GBASE-R Ethernet, an explanation for the abbreviations can be found in Table 2.2

Abbreviation Explanation

MAC Media Access Control

XGMII 10 Gigabit Media Independent Interface

PCS Physical Control Sublayer

PMA Physical Media Attachment

PMD Physical Media Dependent

MDI Medium Dependent Interface

Table 2.2: An explanation for the abbreviations used in Figure 2.1

2.1.1 Media access control

The MAC is described in [8]. The packets that are transmitted from the transmitter and received in the receiver are formatted as shown in Figure 2.2. An Ethernet packet consists of a preamble of 7 bytes, a start frame delimiter (SFD) of 1 byte, a MAC destination adress of 6 bytes, a MAC source address of 6 bytes, a length of 2 bytes, a payload of 46-1500 bytes and a frame check sequence (FCS) of 4 bytes. The MAC destination, the MAC source and the length are often called the header of the Ethernet packet. According to the protocol an interpacket gap (IPG) need to be sent after each packet, this has a standard minimum of 12 bytes. In total a new packet can be sent

(21)

CHAPTER 2. BACKGROUND

is ready to be transmitted, the packet travels with the 10 Gigabit Media Independent Interface (XGMII) to the PCS. In this thesis the XGMII will be 64-bits wide which gives the transfer frequency of 156.25 MHz at every pin. Preamble 7 bytes SFD 1 byte MAC destination 6 bytes MAC source 6 bytes Length 2 bytes Payload 46-1500 bytes FCS 4 bytes IPG ≥12 bytes

Figure 2.2: An Ethernet packet with the following IPG

2.1.2 Physical coding sublayer

The PCS [9] consists of one transmitting part and one receiving part. Both parts are shown in Figure 2.3. One of the most important parts of this block is that the input and output have different bit widths and frequencies. It is up to the PCS to handle this.

In the PCS the 64-bits or 66-bits are called a block. The difference between them is that in the encoder in the transmitting process add a sync header of 2 bits to the block. One block will always contains 8 bytes of data.

FIFO Encoder Scrambler Gearbox

64 64 66 66 W

Transmitter

FIFO Decoder Descrambler Block

Sync

64 64 66 66 W

Receiver

Physical Coding Sublayer

X G M I I P M A

Figure 2.3: A block diagram over the PCS

In this thesis the connection to XGMII has a width of 64 bits and a frequency of 156.25 MHz which gives the correct bandwidth of 10 Gb/s. The connection to the physical medium attachment (PMA) sublayer shall vary in number of bits and will be investigated how the accuracy depends on the bit width. The frequency of 10GBASE-R line is 10312.5 MHz because of the 64b/66b transmission code, by choosing the bit width of W the internal frequency (between the First In First Out (FIFO) and the PMA) in the PCS is given by Equation (2.1).

(22)

fPCS=

10312.5

W MHz (2.1)

First In First Out

The FIFO is a buffer which purpose is to make it possible to use two different clocks in the same unit. One of the clocks is used to write to the buffer while the other reads from the buffer. A FIFO can be implemented with a ring buffer which has one pointer for writing and one for reading. In Figure 2.4 an example of a FIFO with N positions and 64 bits in each position is shown. The difference between the write pointer (wr ptr) and the read pointer (rd ptr) is the number of occupied positions and it is called the offset. 1 2 k rd ptr FIFO Address 64 bits k+1 k+2 k+3 N wr ptr wr data rd data rd clk wr cl

Figure 2.4: FIFO with N positions and 64 bits in each position

Encoder

The PCS is using a 64b/66b transmission code where two bits are added to the block. These bits, called sync header, are either ’01’ for a data block or ’10’ for a control block. A data block contains 8 bytes of data while the control block can contain both data and control information. If there is a control block, the first byte will indicate how the rest of the bits in the block shall be read. A table over the different blocks can be seen in [9] Figure 49-7. Because of the sync header will only be ’01’ or ’10’ and never ’11’ or ’00’, there will always be a transition every 66th bits.

(23)

Scrambler

The scramble is used to give the signal a more random characteristic, which will reduce the long chain of 0 or 1. The scrambler is a self-synchronizing scrambler and will use the polynomial given in Equation (2.2). In Figure 2.5 a serial implementation of the scrambler with the given polynomial can be seen. A parallel implementation will be used but that is harder to visualize. The sync header will bypass the scramble since it is used in the block synchronization discussed below.

1 2 37 38 39 57 58

Data in

Scrambled data out

Figure 2.5: An example how a serial scrambler can be implemented, where the operators are XOR-gates

G(x) = x58+ x39+ 1 (2.2)

Gearbox

The gearbox’s purpose is to change the size of the block. An incoming block size is 66-bits while the outgoing block size is W where W ≤ 66. There is the same frequency on both sides of the gearbox and the same amount of data shall be transmitted on both sides. Because of that, some of the incoming data is invalid. For example if W = 40 bits there will be 66−40

66 = 13 33 invalid bits, in 33 clock cycles there will be 26 valid and 13 invalid 66-bit blocks as an input and 33 40-bit blocks as an output where all the output blocks are valid.

Block synchronization

The block synchronization uses the sync header to synchronize and output 66-bit block. It utilizes that there will always be a transition every 66 bits independent on what type of data that is transferred. It can also be used as part of an error detector to alert an error if the sync header would be ’11’ or ’00’.

(24)

Descrambler

The purpose of the descrambler is to remove the effect from the scrambler in the transmitting process and is done by using the descrambler according to Figure 2.6. The same polynomial that is used for the scramble, Equation (2.2) is used for the descrambler as well. The sync header will bypass the descrambler in the same way as it did in the scrambling process.

1 2 37 38 39 57 58

Scrambled data in

Data out

Figure 2.6: An example how a serial descrambler can be implemented, where the operators are XOR-gates

Decoder

The decoder will decode the data that was encoded in the transmitting process. The 64b/66b decoding will remove the sync header and the output will be 64 bits, 8 bytes of data, which is the same as the input to the encoder in the transmitter.

2.2 Precision time protocol

In this document IEEE 1588-2008 [7] will be referred to as PTP and messages that are sent with this protocol will be referred to as PTP messages. This protocol was developed in order to make it possible to synchronize time and phase over Ethernet. A first version was released in 2002 and the latest revision was released the 2008.

The protocol is built in a master and slave hierarchy where the slave synchronizes the time to its master. The synchronizations are done with PTP messages that contain the time of day (ToD) that are sent from the master to the slave.

2.2.1 Synchronization

In the protocol there are two kinds of PTP messages. There are ordinary messages and there are event messages. The difference between these are

(25)

The timestamp for an event message is taken when the message passes the reference point at the ingress and egress of the node. This reference point can be determined either with software or with help of hardware. If hardware support is chosen for the timestamps it can still be necessary to have some software to handle the synchronization process.

The synchronization can be done either by one-step or two-step. When using one-step synchronization the egress time will be embedded in the message that caused the timestamp. In two-step the egress time will be sent in a follow up message instead of embedding it in the event message.

t1 t2 t3 t4 Sync Follow Up DelayReq Delay Resp

Master Slave Timestamps known by slave

t2

t1, t2

t1, t2, t3

t1, t2, t3, t4

Figure 2.7: A two-step synchronization in PTP

In Figure 2.7 the synchronization is made with the two-step synchronization and in Equation (2.3) the calculations of the delay and offset are presented. The delay is the time it takes for the message to travel from the master to the slave (or slave to the master), the protocol assumes a symmetric delay time. The offset is the difference in time between the slave and the master after compensating for the delay.

tdelay =

(t2− t1) + (t4− t3)

2 (2.3a)

toffset= t2− t1− tdelay (2.3b)

2.2.2 Different types of clocks

An ordinary clock (OC) can serve as both a slave or a master. When writing about an OC later on in this thesis it will be referred to as a slave if nothing else is mentioned.

(26)

The grandmaster (GM) clock is the master clock and contain the actual time that the rest of the system will synchronize their clocks to. There can be several clocks in the system that are claiming the rights of the GM position but only one clock at a time can be the GM. To decide which clock that is the most suitable to take the role as GM, all clocks that can be a master send out an announce message. Each node then runs an algorithm called the best master clock algorithm (BMCA). That algorithm is made so all nodes will make the same decision of which clock is the best master clock and will be GM. If there would be several clocks with the same performance the last selection in the algorithm is the clock identity which serves like a tie-breaker, each clock has a unique identity.

A boundary clock (BC), which can be seen in Figure 2.8, is a clock that serves as a slave on one of its ports and synchronizes its local clock to the master. The BC then acts like a master with its local clock as the reference time to the rest of the system. This clock is useful in a switch or router in a bigger network to reduce the workload from the GM and reduce the time error through the switch or router.

BC

slave

master master

Figure 2.8: Illustration of a boundary clock

The transparent clock (TC), which can be seen in Figure 2.9, is another way to reduce the time error through a switch or router. Compared to the BC this does not have a local clock that needs to be synchronized to its master. Instead, the TC calculates the time a packet spends in the switch or router and compensates for it. A TC can be combined with an OC that synchronizes to the GM to support a network element and in that case not only serve as a switch or router.

(27)

TC

Figure 2.9: Illustration of a transparent clock

2.3 Synchronous Ethernet

SyncE is a recommendation from ITU-T on how to deliver a frequency in a network, [4] and [5] describe the recommendations. According to the recommendation the frequency will be recovered from the bit stream in the physical layer. The clock that will be distributed in the chain is called the primary reference clock (PRC) and all clocks in the network shall be traceable to that clock. To get a traceable clock all nodes in a chain between the master and the end device need to be implemented with a synchronous Ethernet equipment clock (EEC) according to the SyncE recommendations. The performance of the recovered clock will not depend on the network load since it does not synchronize with any specific packet [16]. Figure 2.10 presents a small network that are using syncE.

Higher layers PHY PHY EEC PHY PHY EEC PRC Higher layers Higher layers PHY PHY

Master Slave Slave

Sync Sync

Figure 2.10: Three nodes implemented with Synchronous Ethernet

2.4 Expressions

2.4.1 parts per million

Parts per million (ppm) is a measurement on how accurate a clock is. It is a scale on how inaccurate the clock is allowed to be according to specification

(28)

or another clock. For example, a clock with the frequency of 250 MHz and the accuracy of 100 ppm will have a frequency of 250 ± 250 ∗100₁₀6 MHz which gives a minimum frequency of 249.975 MHz and a maximum frequency of 250.025 MHz.

2.4.2 Free running

A free running clock is a clock that is not synchronized to any other clock or system. This means that two similar clocks that are in free run mode can have slightly different frequencies and most likely different phases.

2.4.3 Hop

A hop in a computer network is one part of the network, when a packet is passes through a forwarding node, for example a router. The total number of hops between a slave and a master is the number of nodes a packet need to go through before it reach its final destination. Figure 1.2 show the concept of how this expression is used.

2.4.4 Topology

In this thesis a topology will be refereed as a network topology that describes how the nodes are arrange in the network and which nodes that are linked to each other. The topology can show how the data packets are sent from one node to another and what data path the packet can take to reach its final destination.

2.4.5 Ingress

The ingress is the input path in the node when data is received.

2.4.6 Egress

(29)

Chapter 3

Time Accuracy

3.1 Problems

In Section 3.1.1-3.1.4 the system will only contain two nodes as Figure 3.1 displays. The left node is acting as a master and the right node serves as a slave. With this smaller system it is easier to describe what timing problems that occurs between two nodes. In Section 3.1.5 a bigger system will be used to describe what problems that occurs when a cascaded system is used and the time information needs to be forwarded through one or several nodes.

Master _t Slave

sm tms

Figure 3.1: A simple system of two nodes

3.1.1 Reference point

When looking at the timing between two nodes the timestamp reference point has a large impact on the accuracy. The reference point can either be in software like an application or as an interrupt. It can also use hardware support to determine the timestamps. In Figure 3.2 the different methods can be seen. If the reference point is placed in software it is hard to know the path and how long the latency is from the communication medium to the reference point, which is symbolized with the cloud. In most of the cases a PTP software block is necessary even if the timestamp will be determined with the hardware support.

(30)

CHAPTER 3. TIME ACCURACY PTP software block PHY Communication medium Hardware reference point Software reference point

Figure 3.2: Different choices for the PTPs reference point

By choosing a software solution it will be cheap and easy to implement since no specific hardware is needed, but it will be problematic to estimate the delay time between when the message arrives until the software reads the message and take the timestamp. In a software solution the timestamp can be taken in an application or in the best case it can use an interrupt.

A hardware solution is closer to the communication medium and is easier to estimate the delay time between the physical medium and when the timestamp will be taken. It needs some hardware assistance, not only to take the timestamp but also to take a fingerprint. The fingerprint is used as an ID, to match the timestamp with its PTP packet in the PTP software block.

Another problem with the hardware solution is that the timestamp will be taken when the SFD byte in the Ethernet packet passes the reference point. At this moment the system is unaware whether the message is an event message or not. That information will arrive later in the PTP header which is located first in Ethernets payload according to Figure 2.2.

3.1.2 Resolution of the timestamp

The ToD is stored like a counter that will update the time each clock cycle. This means that the time will have the same resolution as the period time of the clock.

3.1.3 Asymmetry

The PTP assume that the delay time from the master to the slave (tms) is the same as the delay time from the slave to the master (t ). If this is

(31)

CHAPTER 3. TIME ACCURACY

contribute with a time error in the range of tms+tsm

2 . The asymmetry can be divided into the communication medium and to the node.

The asymmetry in the communication medium depends on what medium that is used. If optical fiber is used the asymmetry can occur from different lasers that are used. This asymmetry is very individual for each cable and the delay may even vary with the temperature of the cables environment.

The PTP assume that the timestamp is measured at the timestamp point but that is not possible since the message is scrambled at that point. The timestamp is instead measured at the reference point and there will be latency between the reference point and the timestamp point. In Figure 3.3 the latency is called transmitting latency or receiving latency depending on if the message is transmitted or received. When the latency differs from each other it will instead contribute to an asymmetry error. If the latency would be the same for both the transmitting and receiving part the latency error could be eliminated.

Transmiting block Receiving block reference point communication medium timestamp point transmitting latency reference point receiving latency

Figure 3.3: Showing the asymmetry in a node

If the asymmetry is known there is a way to compensate for it, but if it varies then it is much harder to correct it. In that case it is only possible to partly reduce the error. In the PCS there is a variable delay that occurs in the FIFO and in the gearbox.

Variable delay in the FIFO buffer

The input and output of the FIFO uses two different frequencies but the same bit width, which means that some data on the side with a faster frequency is invalid. As assumed previously in this thesis one side of the FIFO has a bit width of 64 bits and a frequency of 156.25 MHz. The other side have the same bit width of 64 bits but a frequency that depends on the bit width between the PCS and the PMA. This relation can be found in Equation (2.1). Because of the different frequencies and the invalid data the

(32)

delay in the FIFO will be different for each block of data. This is a variable delay that occurs due to the different frequencies.

In Figure 3.4 an example is used where the bit width between PCS and PMA is 40. With the selected bit width of 40, the internal frequency will be 258.8125 MHz. The time a data block spends in the FIFO is marked red and have a value of 64 in ”bits in FIFO”. The figure just represent the different of bits in the FIFO and not the absolute value. Where it shows 0 bits it represents the lower value that will be stored in the FIFO and not necessary 0. A preferred solution would be to store an integer of a period in the FIFO. In this case, with a bit width of 40, a period would be 128 ns. That correspond to 20 cycles of Clk1 and an integer number for the FIFO would be N*1280 bytes, where N is an integer. The grey area in the figure represent the invalid data block that is transported internally in the PCS.

Clk1156 MHz

Clk2258 MHz

Data1in to FIFO

Bits in FIFO 0 64 0 64 0 64 0 64 0 64 0 64 0

Data2out of FIFO

Figure 3.4: An example where data1 writes to the FIFO with clock1 and data2 reads from the FIFO with clock2

Variable delay in the gearbox

The gearbox will have an input of 66-bit block and an output bit width that is the same as the bit width between the PCS and PMA. The gearbox will therefore have a buffer that will be differently filled at different times. The time from the input of the gearbox to the communication medium will depend on how full the buffer is. The buffer has the same clock for reading and writing, by reading out how full the buffer is this delay can be compensated for.

3.1.4 Frequency accuracy

The accuracy of the frequency can also contribute to a time error and needs therefore to be mentioned. If a system have a requirement on the time accuracy a high accurate local clock is probably used. In such a system with a high accurate local clock there might not be any problem with the frequency accuracy. Since there is no specified requirement on the local clock in this thesis it is necessary to take this error in to account as well.

The frequency error occurs when the frequency in the master clock is different from the frequency in the slave. In this thesis an assumption is

(33)

give an accuracy better than 10−13[16]. If the slave node for example use a local clock with an accuracy of ±100 ppm (which is the requirement for Ethernet). Then the difference in time between the master and the slave can be 100 µs after one second.

3.1.5 Packet delay variation

The previously mentioned problems occurs in a single node. Therefore all these problems will contribute with a time error in a cascaded system. But in a cascaded system there is also necessary to consider the time a packet spend inside the node. In a network with nodes that shall forward packets (switches and routers), the packet will be received, placed in a queue and then be transmitted again. The time it spends inside a node depends in part of the queue, the time difference called packet delay variation. With help of PTP there are two methods to implement a forwarding node to handle this delay, they will be further discussed in Section 3.2.4. PTP can also be implemented without handling this delay at all but the accuracy of the synchronization and the delay measurement will then decrease.

3.2 Possible solutions

3.2.1 Reference point

To get a good accuracy the timestamp reference point needs to be taken with hardware support. Since the timestamp is taken when the SFD passes the reference point all messages need to be timestamped. After the PTP header is read a decision can be made if the timestamp shall be transmitted to the PTP software block or if it shall be discarded.

The best place to put the hardware is to do it after the PCS, because before the PCS the messages are still scrambled and no information can be read from the messages. It is good to have the reference point close to the communication medium to get the lowest variable delay and the best possible time accuracy. Therefore the best point to take the timestamp is in the XGMII while information between the PCS and the MAC is transmitting.

The fingerprint is used for bringing the event message together with the timestamp in the software block. It can be solved with the sequenceId, message type and the source address. The sequenceId is a number that increase for each transmission of a specific message. By saving a fingerprint with the timestamp and compare it with incoming event message in the software block the correct timestamp shall be concatenated with the correct message.

(34)

3.2.2 Resolution of timestamp

The choice of clock that update the ToD for the timestamp get a resolution of the period time. ToD will accumulate with the period time every clock cycle to keep track of the time. An oscillator that can be used for this purpose is a free running 125 MHz clock which give an resolution of 8 ns [13]. It is also possible to use faster oscillators such as a 250 MHz which gives a resolution of 4 ns.

Instead of using a free running clock for the ToD, a recovered clock can be used. This will have the same problem with the resolution but instead of a period time of 8 or 4 ns the period time will be a fractional value. By choosing a bit width of 16 bits between the PCS and PMA, the clock frequency will be 644.53 MHz and the resolution of the accumulated time will be 1.6 ns.

3.2.3 Asymmetry

The asymmetry was divided in asymmetry inside the node and asymmetry between two nodes. According to Section 1.3.3 the only asymmetry that is important is the one that occurs inside the node. But when PTP is used the delay will be measured with help of the timestamps, the time between two reference points in two different nodes. This leads to that the asymmetry in the medium will contribute to a time error in the PTP delay measurement. With that in mind it might be necessary to move the time budget from the communication medium to the node or a delay measurement budget.

By knowing the fixed latency between the timestamp and the communication medium it can be compensated for by using Equation (3.1), T stands for transmit and R for receive. But it is only the fixed delay that can be totally removed by this equation. A variable delay can only be reduce by estimate the average delay and in the best way halving the maximum error.

T Timestamp = T MeasuredTimestamp + T Latency (3.1a) R Timestamp = R MeasuredTimestamp − R Latency (3.1b)

In Figure 3.4 there was an example about the variable delay in the FIFO. When using hardware support for the timestamp and it is located at the XGMII as discussed previously, the delay from the PCS is the only variable delay before the timestamp reference point. The delay depended on the frequency which in turn depends on the bit with to the PMA. By studying different bit widths and calculate the variable delay it can bee seen that with a higher frequency (smaller bit width) the variable delay will decrease.

(35)

and the longest time a data block is stored in the FIFO. To get this result the data is periodically transferred through the PCS in a cycle of 33 clock cycles. The valid blocks is spread out over the whole period to get as equal time in the FIFO as possible. The FIFO is presumed to have data for at least one period in the beginning.

0,00 1,00 2,00 3,00 4,00 5,00 6,00 7,00 64 40 32 20 16 10 8 ns Bit width

Variable delay in FIFO

Figure 3.5: The chart shows the variable delay in the FIFO for different bit widths between PCS and PMA

The variable delay in the gearbox will also depend on the bit width. Because of the same clock is being used for both input and output in the gearbox the variable delay will be cyclist. The cycle will always contain 33 clock cycles because of the selected different values of the bit width. The SFD that shall be timestamped will always be at the same position in an incoming 66-bit block. The only part that can differ is how full the gearbox is when the block arrives. Since the delay depends on how full the buffer is and this will be repeated in 33 cycles. It is possible to compensate for it with a variable value depending on number of bits in the buffer.

According to the Ethernet standard [9], 16 bit is the original bit width between the PCS and PMA. This gives a variable delay of 1.36 ns from the FIFO. If it is compensated with help of Equation 3.1 the contribution to the time error will be 0.68 ns. This value is for a well designed FIFO with highly accurate clocks for reading and writing. If the clock for either the writing or reading is of a lower accurate clock, they can start to drift against each other and this will contribute to a bigger time error.

(36)

3.2.4 Precision time protocol implementation

The implementation of PTP can vary depending on how accurate the protocol needs to be and how much it is allowed to cost. In this document we have already assumed that the timestamp point will be in the hardware between the PHY and the MAC. Another thing that can be assumed from the system is that it is only one node that is consider to be the master node. Because of that, the announce message, which is sent out by all master nodes so that the GM can be decided, is not that important. The BMAC that all nodes use to decide which is the best master node is not either of any great interest. Therefore those will only be mentioned and not handled as much as the rest of the implementation choices.

Boundary clock vs Transparent clock

If a node is not only a slave but also forwards timing information further down in the system it needs some more functionality. To make it accurate it can be designed with either BCs or with TCs, but it can also be a combination of both TCs and BCs. First there will be a comparison between them in the selected system and then there will be a short discussion on which solution that is preferred for the given system.

When the system is using BCs each node will synchronize its local clock to its master. In a cascaded system each node is depending on the previous clock. Each node will have a control loop and by cascading this loop jitter and time error will accumulate though the system. Therefor a high cascaded network with BC can cause accuracy problems. The error can be reduced by using high quality oscillators. That will be an expensive solution and will not remove the problem with the cascaded control loop. A better solution is to use the TC that is more suitable for the cascaded network because it do not contain any control loop [10].

The BC is more suitable in systems with a tree topology where one input lead to several outputs that shall bee fed. The BC will then move workload from the GM to the BC because sync and delay request messages do not pass through the BC. The BC synchronize to the master as an OC and then acts like a master and sends out separate sync and delay request message to the rest of the system. In that way the master does not need to know what the system looks like after the BC.

(37)

CHAPTER 3. TIME ACCURACY GM OCGM OCGM BCGM OCBC OCBC OCBC S M M M

Figure 3.6: Show a network where a boundary clock is useful to reduce the workload from the grandmaster

Figure 3.6 shows how a BC can be used. All the end devices have an OC that is synchronized to its master. The OCGM and the BCGM are synchronized to the GM and OCBC are synchronized to the BC. The GM only synchronize to 3 nodes instead of 5 that would be necessary without the BC.

The TC does not need to synchronize to its master, instead it keeps track of how long time the packets spend inside the node and then put this time (correction time) in the correction field in the PTP message. If two step synchronization is used the TC include the correction time in the follow up message and delay response message. If one step synchronization is used the correction time is added directly in the sync message and the delay request message. In Figure 3.7 it is shown how the TC is used to get the correction time to the correction field.

(38)

Ingress

Residence time bridge Egress

Local time

Ingress timestamp Egress timestamp Correction time

Figure 3.7: The time estimation in a transparent clock

When the local clock does not need to synchronize to the master it will not have the same problem as the BC with the cascaded control loops. The nodes will not depend on each other like they did with BCs, the only time error will be if the clock is free running and estimate the time incorrect. For example if an oscillator with an error of 100 ppm is used and the latency through a TC is 1 µs, the maximum time error for the TC will be 100

106 ∗ 1 ∗ 10−6= 100 ps.

If an network element shall be used in the node with a TC it needs to have an OC attach to it. This OC will keep track of the ToD which an ordinary TC does not do, but it will not affect nodes further down in the system.

In most of the cases the TC is preferred, especially in a high cascaded system. In a big tree topology it can be useful to have some BC to reduce the workload from the GM.

In [14] they have studied the accuracy between the BC v1 and the TC in highly cascaded network where they used up to 30 nodes. The result show that the TC get a much lower maximum jitter than the BC. By looking at 10 nodes which is of interest for this documentation the TC implementation is about three time better than the BC implementation.

One-step vs Two-step

Two-step synchronization is shown in Figure 2.7 where the timestamp from the sync message is send in a follow up message and not embedded. If one-step synchronization shall be used instead of two-one-step it need to embed the timestamp in the sync message, therefore the follow up message is removed. If the timestamps are generated at a hardware level the timestamps need to be embedded in hardware as well. Therefore the timestamps will be taken

(39)

reference point. The FCS in the Ethernet frame must also be recalculated in all the forwarding nodes when one-step synchronization is used. Therefore it is easier to use two-step synchronization when the timestamp reference point is in the hardware level.

The same reasoning can be used with the Pdelay response follow up message if Peer-to-Peer (P2P) is used.

End-to-End vs Peer-to-Peer

Projected that the TC is selected it can either be implemented with End-to-End (E2E) or P2P. They are both using the same technique to send synchronization messages but differ in how to deal with the delay estimation. In Figure 3.8 the different messages for each type of delay estimation are shown. GM TC TC OC Delay Req Delay Resp Pdelay Req Pdelay Resp Pdelay Resp Follow Up

Figure 3.8: The difference between End-to-End (solid line) and Peer-to-Peer (dotted line) delay estimation

When E2E is used the master clock gets a delay request message from each slave and sends out delay response message as an answer for each request. The slave then calculates the delay path between the master and itself. With this implementation it is not necessary to have any specific PTP routers or switches in the system, it is enough to have PTP implemented at the end devices that will send and receive the synchronization. But if there is a router or switches that do not have PTP implemented the accuracy will be heavily reduced, especially in highly loaded networks.

If P2P is used instead of E2E each node only communicate with its neighbor to calculate the delay path. Each node sends a Pdelay request message to the previous node, which answers with a Pdelay response (and a Pdelay response follow up message if two-step synchronization is used). In this way the receiving node knows the delay from its neighbor and can compensate for it when receiving a synchronization message.

(40)

With P2P there will only be 3 (2 if one-step is used) messages between each node to do one delay calculation. In E2E the number between two nodes will increase with the total number of nodes. For E2E the average number of PTP delay messages between two nodes will be the same as the number of nodes in the cascaded system. The workload will also be moved from the GM to the slaves when P2P is selected. Therefore P2P can be to prefer in a bigger system.

Exemplifying this with a cascaded system with 10 nodes. The total number of delay messages to update the delay path would be 90 for E2E while it would only be 27 for P2P. It is even lower if one-step synchronization is used.

When two-step synchronization is used there is another advantage by using P2P instead of E2E. When a delay request message passes by a node a timestamp will be taken at ingress and egress. The difference will be added in the correction field, in the delay response message. Therefore a node needs to store the correction time until the delay response message passes the same node on the way back. In a chain topology a node may need to store several different correction times for several different nodes further down in the chain. This requires extra memory and extra logic to handle the correction time for an E2E solution.

Message interval

There are 3 messages that need to have a message interval defined. These are sync, delay request (or Pdelay request) and announce message. Follow up will be sent after a sync message and will therefore have the same interval as the sync message. Delay response will be an answer to the delay request. If P2P is used the Pdelay response follow up message will serve as a follow up message to the Pdelay response and have the same interval as Pdelay request and Pdelay response.

The message interval is set with a 8-bit two complement number which is the logarithm with the base of two. If the sync message interval is set to -4 there will be 2−4 s between each sync message, which is the same as 16 sync message each second. The protocol also defines this time to be the average of the message rate and each message will with 90% confidence not differ with more than 30% from the average. This is necessary information when discussing the worst case scenario but when estimating the accuracy the average can be used and will be used further in this document.

The announce message is used to decide which master clock that will serve as the GM. Since the system in this document only has one GM the announce message cannot be used to change master, it can only indicate if the network is broken or not. Therefore it is not that important to have a high message rate for the announce messages.

(41)

PTP clock with an accuracy of 100 ppm according to the master. The slave can have an error of 100 µs after a second. But if the sync message rate is 32 message/s the error will be reduced with a factor of 32. This results in an error of maximum 3.125 µs independent of how long time the system is running.

The delay request and the delay response is used to measure the delay time between two nodes. The contribution of the asymmetry will end up in these delay measurements. Even if the communication medium is handled in a separate budget it will be hard to separate where the asymmetry occurs with help of these measurements. One advantage is that the delay measurement is done continuously and if the delay changes (for example with the temperature) the measurement will notice the change.

Message PTP message size Ethernet packet size Announce 64 102 Sync 44 84 Follow Up 44 84 Delay Req 44 84 Delay Resp 54 92 Pdelay Req 54 92 Pdelay Resp 54 92

Pdelay Resp Follow Up 54 92

Table 3.1: Show the message size for the PTP message in bytes

In Table 3.1 the different message size can be seen for each message. The PTP message that only contain 44 bytes need 2 extra padding bytes to reach Ethernets requirement of the minimum payload of 46 bytes. In Ethernets packet size the preamble and 12 bytes IPG are included.

0,00000% 0,00200% 0,00400% 0,00600% 0,00800% 0,01000% 0,01200% 0,01400% 0,01600% 0,01800% 128 64 32 16 8 Occupation of network Message/s

Network load

Sync and follow up E2E Delay average E2E Delay first hop P2P Delay

(42)

In Figure 3.9 the network load for different message can be seen for a network with 9 node. As discussed in the section End-to-End vs Peer-to-Peer the network load is different between two nodes depending on where in the chain the nodes are located when E2E is used. The worst case is between the master node and the first slave node. Even though the packets are slightly bigger when P2P is used instead of E2E the total network load will be much smaller with P2P.

3.2.5 Frequency accuracy

The ToD is a register that contains the time and will be updated every clock cycle. If the frequency at a slave differs from the frequency at the master this will lead to an error that was discussed in Section 3.1.4. A solution would be to change the accumulated value at the slave, instead of accumulate with 4 ns every clock cycle 3.996 ns can be used. This would compensate for a slave clock that is 100 ppm faster than the master clock. This will increase the accuracy of the frequency but there will still be errors if the slave clock start to drift.

Another, more accurate, way to solve the problem with the frequency accuracy is to use SyncE. With help of SyncE the slave clock will synchronize to the master clock in a physical level and will have the same stability as the master clock through the whole system. There is some drawback by using SyncE, the biggest one is that a more accurate local clock is necessary at the slave nodes. Instead of a clock with an accuracy of 100 ppm as the requirement is for Ethernet, the local clock need to be within 4.6 ppm. There can be at most 10 clocks of this type cascaded before an even better clock is necessary. SyncE achieve a long term frequency accuracy of 10 parts per trillion [11]. This corresponds to 10 ps time error in one second.

(43)

Chapter 4

Frequency recovery

4.1 Problems

The frequency problem deals with recovering of the radio frequency from Ethernet. As mentioned before the transmit frequency of Ethernet is 10.3125 GHz. After recovering of the frequency and division with 16 the frequency is 644.53125 MHz. With help of that a radio frequency will be recovered which is a multiple of 3.84 MHz. To be able to recover a radio frequency a phase lock loop (PLL) can be used. To get an idea of how good it can get a simulations tool from Analog Devices has been used1_. This has some constraint and because of that a frequency of 206.25 MHz have been used instead of 644.53125 MHz which is ₂₅8 ∗ 644.53125 MHz or 10.3125 MHz divided with 50 instead of 16. This is because the software tool can not handle more than 3 decimals and needs to have an fpd that is less than 247.5 MHz. The radio frequency that will be used in the simulations is 491.52 MHz which is the same as 128 ∗ 3.84 MHz.

4.2 Possible solutions

There is one major decision that needs to be done, either to implement the PLL with an integer divider or using a fractional PLL with a fractional divided.

The integer divider is the most known one and is used in an integer PLL which can be seen in Figure 4.1. The input frequency, fpd, will be compared with frefand the output will be filtered and fed a voltage controlled oscillator (VCO) which controls the frequency output. When the system is stable the output frequency will be the same as the input frequency multiplied with N , N is an integer. Figure 4.2 show a typical phase noise graph for an integer

(44)

CHAPTER 4. FREQUENCY RECOVERY

PLL. The top line is the total phase noise and the only one that is of interest in the graph. Error detector Loop filter VCO 1/N f out f pd f ref

Figure 4.1: Block diagram over an integer PLL

Figure 4.2: Phase noise graph for an integer PLL

The other way to implement it will be by using an fractional PLL. This is very similar to the integer PLL but have one big difference. Instead of using a divider with an integer value N it can use both a value N and N + 1, by changing between them over a period the average will be seen as a fractional value of M . A block diagram of a simple fractional PLL can be seen in Figure 4.3. For example if N = 3 and a period of 10 clock cycles is used, 3.7 can bee achieved by using the value N (3) for 3 clock cycles and N + 1 (4) for 7 clock cycles. The output frequency will in this case be M ∗ fpd. With this kind of implementation, a higher fpd can be used without losing in resolution of the frequencies. This leads to a faster system

(45)

phase noise can be seen in Figure 4.4. The top line is the total phase noise and the only one that is of interest in the graph.

Error detector Loop filter VCO 1/M f out N N + 1 f pd f ref

Figure 4.3: Block diagram over a fractional PLL

Figure 4.4: Phase noise graph for a fractional PLL

By reducing the loop bandwidth and adding an extra pole in the filter the spurs can be reduced. In Figure 4.5 the spurs are reduced with a magnitude of almost 90 dBc for the worst factional spur. But even if they are reduced a lot they will still exist and may cause problems.

(46)

Figure 4.5: Phase noise graph for a fractional PLL with reduced spurs

Choosing the best PLL solution is difficult and will be a trade off between properties like, phase noise, spurs and lock time. It is up to the designer to choose what properties that are most important for each application and make a suitable design according to the wanted properties.

(47)

Chapter 5

Result

The different choices to implement the PTP have been discussed and to get a high accurate solution PTP need to be implemented with hardware support. The hardware will be located at the XGMII between the PCS and the MAC unit. Because of the hardware support being necessary for the timestamp it is convenient to have a two-step synchronization to minimize the complexity in the hardware and also not affecting the time accuracy. The best solution for forwarding nodes is to implement them as a TC. If the nodes will contain a network element and not only forwarding message, an OC needs to be attached to the TC. When choosing between P2P and E2E they are equally accurate. P2P is recommended because it will effect the network load less than E2E if more than 3 nodes are used. The work will also be moved from the master to the slave nodes. An E2E solution would also need an extra memory to store the correction time for delay measurement which is not necessary with P2P. If the network that is going to be used is not totally owned by the user and cannot guarantee that all nodes will have a PTP implementation E2E is to prefer, but in that kind of network the accuracy will be heavily reduced. The interval for both sync and Pdelay Req messages will be sent with a ratio of 32 messages/sec. This will lead to that Follow Up, Pdelay Resp and Pdelay Resp Follow Up will send with the same ratio. This is a high value to make sure that the message interval will not affect the accuracy. The announce message is not that much of interest and the ratio will be set to 1 message/sec. In total the PTP message will occupy the network with

8 ∗ [102 + 32 ∗ (84 + 84) + 32 ∗ (92 + 92 + 92)] = 114480bits/sec In 10GBASE Ethernet this correspond to 0.001% of the network.

The frequency accuracy is preferred to be transferred with SyncE. In comparison to PTP where only the end nodes need to be implemented with PTP, SyncE needs to have all nodes in the chain implemented with SyncE. If one node is not implemented with this technique the rest of the chain can not be guaranteed a good synchronization.

(48)

CHAPTER 5. RESULT

To get an estimation of the time error from when an event message arrives until it will be timestamped all the time errors need to be summed up. First of all we have the asymmetry. The asymmetry in the node can be reduced with only the variable delay in the FIFO with a well designed PCS. The delay will then be 0.68 ns if the standard width of 16 bits will be used between PCS and PMA. Then the asymmetry from the communication medium will be added but any number will not be presented. The error that occurs at the resolution of the clock will be half of the clock period. By using the recovered clock of 664.53 MHz the error will be 0.78 ns. With help of SyncE for frequency synchronization the error will be under 1 ps and can therefore be neglected. In total the error does not need to be greater than 1.46 ns if the asymmetry in the medium is neglected. To achieve this accuracy a well designed PCS and FIFO is needed.

It is possible to recover a radio frequency from Ethernet, but the highest common frequency is 15 kHz and there is no general way to recover a radio frequency that suits all applications.

(49)

Chapter 6

Conclusion

The proposed solution is a bit expensive and if the accuracy is not needed a simpler and cheaper solution can be used. For example two synthesizers are necessary in the proposed solution. One to get a traceable frequency to the PRC according to SyncE and one to recover a radio frequency from the medium. SyncE also needs to have a more expensive local oscillator than ordinary, non synchronous, Ethernet.

Texas instrument have a PTP device (DP83640) which also have SyncE that can be activated trough a register. The test result show that by activate SyncE in the test the peek-to-peek time error can be reduced from 119.25 ns to 700 ps [1]. The standard deviation also decrease, from 9.537 ns down to 77.5 ps. In comparison to the theoretical result that has been presented in this report an error of 1.46 ns seem to be a realistic peek-to-peek value. In the test the standard deviation is almost one tenth which can give a roughly estimated standard deviation of 150 ps.

According to the frequency there will always be a trade off and each application needs to decide what is the most important property for each case. To get a wanted result for the specific application a more detailed investigation need to be done. There would also be preferable with measurements and not only simulation and theoretical values.

(50)

Bibliography

[1] An-1730 dp83640 synchronous ethernet mode: Achieving sub-nanosecond accuracy in ptp applications. http://www.ti.com/lit/ an/snla100a/snla100a.pdf.

[2] Ethernet IEEE 802.3 tutorial - an overview or tutorial of Ethernet, IEEE802.3 used widely for local area network, LAN applications.

[3] IEEE-1588 standard for a precision clock synchronization protocol for networked measurement and control systems.

[4] Timing and synchronization aspects in packet networks. http:// handle.itu.int/11.1002/1000/12015.

[5] Timing characteristics of a synchronous ethernet equipment slave clock. http://handle.itu.int/11.1002/1000/10909.

[6] Information technology - open systems interconnection - basic reference model: The basic model. Nov 1994.

[7] IEEE standard for a precision clock synchronization protocol for networked measurement and control systems. IEEE Std 1588-2008 (Revision of IEEE Std 1588-2002), pages c1–269, July 2008.

[8] IEEE standard for ethernet - section 1. IEEE Std 802.3-2012 (Revision to IEEE Std 802.3-2008), Dec 2012.

[9] IEEE standard for ethernet - section 4. IEEE Std 802.3-2012 (Revision to IEEE Std 802.3-2008), Dec 2012.

[10] Alexandra Dopplinger and Jim Innis. Using IEEE 1588 for synchronization of network-connected devices.

[11] J.-L. Ferrant, M. Gilson, S. Jobert, M. Mayer, M. Ouellette, L. Montini, S. Rodrigues, and S. Ruffini. Synchronous ethernet: a method to transport synchronization. Communications Magazine, IEEE, 46(9):126–134, September 2008.

(51)

[12] Jean-Loup Ferrant, Mike Gilson, S ˜A c bastien Jobert, Michael Mayer, Laurent Montini, Michel Ouellette, Silvana Rodrigues, and Stefano Ruffini. Standards in telecom packet networks using synchronous ethernet and/or IEEE 1588. Synchronous Ethernet and IEEE 1588 in Telecoms, page 329, 2013.

[13] 2 ) Han, J. ( 1 and 2 ) Jeong, D.-K. ( 1. A practical implementation of IEEE 1588-2008 transparent clock for distributed measurement and control systems. IEEE Transactions on Instrumentation and Measurement, 59(2):433–439, 2010.

[14] D. Mohl and M. Renz. Improved synchronization behavior in highly cascaded networks. In Precision Clock Synchronization for Measurement, Control and Communication, 2007. ISPCS 2007. IEEE International Symposium on, pages 96–99, Oct 2007.

[15] Michel Ouellette, Ji Kuiwen, Liu Song, and Li Han. Using IEEE 1588 and boundary clocks for clock synchronization in telecom networks. IEEE Communications Magazine, 49(2):164 – 171, 2011.

[16] S. Rodrigues. IEEE-1588 and synchronous ethernet in telecom. In Precision Clock Synchronization for Measurement, Control and Communication, 2007. ISPCS 2007. IEEE International Symposium on, pages 138–142, Oct 2007.

(52)

(53)

Avdelning, Institution Division, Department Datum Date Spr˚ak Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats ¨Ovrig rapport

URL f¨or elektronisk version

ISBN

ISRN

Serietitel och serienummer Title of series, numbering

ISSN

Link¨oping Studies in Science and Technology

Thesis No. 4824 Titel Title F¨orfattare Author Sammanfattning Abstract

§ In this thesis an investigation will be done on how time and frequency can be synchronized over Ethernet with help of Precision Time Protocol and Synchronous Ethernet. The goal is to achieve a high accuracy in the synchro-nization when a topology of 10 cascaded nodes is used. Different approaches may be used when implementing Precision Time Protocol for synchronization. They will be investigated and the best approach for a good accuracy will be proposed. Another question that this thesis will cover is how to recover a radio frequency, a multiple of 3.84 MHz from Ethernets 10.3125 GHz.

By using hardware support for the timestamps and transparent clocks in the forwarding nodes the best accuracy is achieved for the time and phase synchronization. Combining this with Synchronous Ethernet for frequency synchronization, to get a traceable clock through the system, will lead to the best result. The total error does not need to be greater than 1.46 ns if the asymmetry in the medium is neglected and a well designed PCS and FIFO are used. Recovering the radio frequency from Ethernet is done by using the highest common frequency, either an integer phase locked loop or a fractional phase locked loop can be used. The fractional phase locked loop will give a better result but will contribute with spurs that the integer phase locked loop does not.

ISY,

Department of Electrical Engineering 581 83 Link¨oping Februari 20, 2015 -LiTH-ISY-EX--15/4824--SE -http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-115882 Februari 20, 2015

Timing and Synchronization over Ethernet

Emil Lundqvist

× ×

Timing and Synchronization over Ethernet

Final thesis

Timing and Synchronization over

Ethernet

Emil Lundqvist

LiTH-ISY-EX--15/4824--SE

Final thesis

Timing and Synchronization over

Ethernet

Emil Lundqvist

LiTH-ISY-EX--15/4824--SE

Februari 20, 2015

Abstract

Acknowledgements

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

About the work

1.1.1

Communication protocols

1.2

Presentation of the problem

1.3

Restrictions

1.3.1

Topologies

1.3.2

The system

1.3.3

Time error budget

1.4

Related research

1.5

Outline

Chapter 2

Background

2.1

Ethernet

2.1.1

Media access control

2.1.2

Physical coding sublayer

2.2

Precision time protocol

2.2.1

Synchronization

2.2.2

Different types of clocks

2.3

Synchronous Ethernet

2.4

Expressions

2.4.1

parts per million

2.4.2

Free running

2.4.3

Hop

2.4.4

Topology

2.4.5

Ingress

2.4.6

Egress

Chapter 3

Time Accuracy

3.1

Problems

3.1.1

Reference point

3.1.2

Resolution of the timestamp

3.1.3

Asymmetry

3.1.4

Frequency accuracy

3.1.5