Adaptive TDC : Implementation and Evaluation of an FPGA

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Adaptive TDC

Implementation and Evaluation of an FPGA

Examensarbete utfört i Elektronik vid Tekniska högskolan vid Linköpings universitet

av

Simon Andersson Holmström LiTH-ISY-EX-ET-15/0428–SE

Linköping 2015

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Adaptive TDC

Implementation and Evaluation of an FPGA

Examensarbete utfört i Elektronik

vid Tekniska högskolan vid Linköpings universitet

av

Simon Andersson Holmström LiTH-ISY-EX-ET-15/0428–SE

Handledare: Andreas Ehliar

ISY, Linköpings universitet

Examinator: Jan-Åke Larsson

ISY, Linköpings universitet

(4)

(5)

Avdelning, Institution Division, Department

Avdelningen för Datorteknik Department of Electrical Engineering SE-581 83 Linköping Datum Date 2015-04-29 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-134424

ISBN — ISRN

LiTH-ISY-EX-ET-15/0428–SE Serietitel och serienummer Title of series, numbering

ISSN — Titel Title Adaptive TDC Författare Author

Simon Andersson Holmström

Sammanfattning Abstract

Time to digital converter (TDC) is a digital unit that measures the time interval between two events. This is useful to determine the characteristics and patterns of a signal or an event. In this thesis a hybrid TDC is presented consisting of a tapped delay line and a clock counter principle.

The TDC is used to measure the time between received data in a QKD application. If the mea-sured time does not exceed a certain value then data had been sent without any interception. It is also possible to use TDCs in other fields such as laser-ranging and time-of-flight applications.

The TDC consists of two carry chains, an encoder, a FIFO and a counter for each channel, an AXI-module and a control unit to generate command signals to all channels that are implemented. The time is measured by sampling the signal that has propagated through the carry chain and from this sample encode the propagation length.

In this thesis a TDC is implemented that has a 10 ns dead time and a resolution below 28 ps in a four channel mode. The propagation variation is approximately two percent of the total value during testing. For the implementation an FPGA-board with a Zynq XC7Z020 SoC is used with SystemVerilog that is a hardware describing language (HDL).

Nyckelord

(6)

(7)

Abstract

Time to digital converter (TDC) is a digital unit that measures the time interval between two events. This is useful to determine the characteristics and patterns of a signal or an event. In this thesis a hybrid TDC is presented consisting of a tapped delay line and a clock counter principle.

The TDC is used to measure the time between received data in a QKD application. If the measured time does not exceed a certain value then data had been sent without any interception. It is also possible to use TDCs in other fields such as laser-ranging and time-of-flight applications.

The TDC consists of two carry chains, an encoder, a FIFO and a counter for each channel, an AXI-module and a control unit to generate command signals to all channels that are implemented. The time is measured by sampling the signal that has propagated through the carry chain and from this sample encode the propagation length.

In this thesis a TDC is implemented that has a 10 ns dead time and a resolution below 28 ps in a four channel mode. The propagation variation is approximately two percent of the total value during testing. For the implementation an FPGA-board with a Zynq XC7Z020 SoC is used with SystemVerilog that is a hardware describing language (HDL).

(8)

(9)

Abbreviations

ALU . . . Arithmetic Logic Unit

AXI . . . Advanced eXtensible Interface CLB . . . Configurable Logic Block CSR . . . Control Status Register DFF . . . Data Flip-Flop

DSP . . . Digital Signal Processor DUT . . . Device Under Test FIFO . . . First In, First Out

FPGA . . . Field Programmable Gate Array FSM . . . Finite State Machine

HDL . . . Hardware Description Language FSM . . . Finite State Machine

LSB . . . Lowest Significant Bit LUT . . . Look-Up Table PG . . . Pattern Generator PL . . . Programmable Logic PLL . . . Phase Locked Loop PS . . . Processing System

RAM . . . Random Access Memory Data SIMD . . . Single Instruction Multiple Data

(10)

iv Abbreviations

SoC . . . System on a Chip

T_LSB . . . Lowest measurable time interval for the TDC TDC . . . Time to Digital Converter

QKD . . . Quantum Key Distribution

(11)

List of Figures

2.1 Overview of a CLB . . . 6

2.2 CARRY4 element . . . 7

2.3 Basic delay chain . . . 8

2.4 Principle for a synchronous TDC . . . 9

2.5 Operating principle of a delay chain in the TDC . . . 10

2.6 A delay chain with a clock skew . . . 11

3.1 The top level overview . . . 14

3.2 Overview of one TDC module . . . 15

3.3 Screenshots from PlanAhead . . . 16

3.4 The FSM in the control module . . . 18

4.1 An overview of the soft test bench setup . . . 22

4.2 A comparison between maximum and minimum propagation values . . 24 4.3 A comparison between maximum and minimum propagation variance . 26

(12)

List of Tables

4.1 Differences in maximum value between carry chains . . . 24 4.2 Differences between some start signals . . . 25 4.3 Differences in propagation between some runs with some added logic . 25 4.4 A collection of maximum and minimum values for the trace encoder . . 26 4.5 Resource utilization on the FPGA for four channels TDC . . . 27 4.6 Distribution of the used resources . . . 27 4.7 Resource utilization on the FPGA for four channels TDC with calibration 28 4.8 Resource utilization on the FPGA for the four channel TDC with trace

encoder . . . 28 4.9 A table over the clock path skew . . . 29

(13)

1

Introduction

This section is an introduction to the thesis and gives a short brief of the background, the problem-formulation, the limitations and the method used.

1.1 Background

This thesis investigates the possibilities and challenges of designing a time to digital con-verter (TDC). The TDC measures the time interval between a start and a stop signal and converts the measured time into a digital value. This is useful for miscellaneous measur-ing tools in various fields.

In condition that the precision is sufficiently high the TDC could be used for measur-ing disturbances on a data channel. An example of an application that takes advantages of this is Quantum Key Distribution (QKD). This application is provable secure for shar-ing information through a public channel without any third party obtainshar-ing insight of the information. One popular protocol for this application is the BB84 (Bennett and Bas-sard,1984) , see for example Michael and Isaac [2000] [Michael and Isaac, 2000, Qi and Weiyue, 2013].

The provable security of QKD is based on the no-cloning theorem and the only re-quirement is that qubits could be communicated through the public channel with an error rate lower than a certain threshold. The basic principle with the QKD is that Eve can not intercept Alice and Bobs transmission without affecting the signal [Qi and Weiyue, 2013, Michael and Isaac, 2000].

(16)

2 1 Introduction

Qubits are the information carriers in quantum systems. A difference between qubits and ordinary databits are that qubits also have a possibility to represent a one or a zero in a superposition, which also allows the machine to represent both zero and one in a linear combination [Michael and Isaac, 2000].

TDCs are also used in other fields such as time-of-flight and laser ranging applications. It is possible to implement TDCs with different techniques such as oscillators, coun-ters and delay lines. However, in this thesis it is chosen to implement a TDC on an FPGA, and therefore use a combination of delay line and counter, to make it flexible and cost efficient to implement.

1.2 Problem-formulation and limitations

The goal of this thesis is to design and evaluate a TDC based on an FPGA. The main purpose is to use this design on a QKD-system. In order for this system to work correctly in this application, it must have a resolution lower than 100 ps.

The system is designed for a Xilinx Zynq, and therefore some changes might need to be done in case the system would be implemented on another platform.

Due to the limited amount of time, certain limitations had to be done. In relation to this there will for instance not be a study of how comprehensive the temperature impact is on the given design. Instead, this will be left for future work.

(17)

1.3 Method 3

1.3 Method

In order to obtain the necessary knowledge in the give area, the thesis started with a literature study.

After obtaining the necessary knowledge, the thesis continues by formulating a model of the system and dividing it into smaller modules that have been implemented separately. Each module is then tested individually in the software test bench and corrected from defects. When no more defects have been found, these modules are assembled into larger modules and retested in the test bench for a error that could occur due to the merging of modules. This proceeds until the entire system has been assembled and verified.

After the design have been verified it is implemented on the development board, named Zedboard. This board is also used together with a pattern generator (PG) to verify the design. During the verification phase, some minor modifications have been done in order for the system to be more stable and predictable. Additionally, some function is also added to ease the testing and measuring.

When the functionality of the design is confirmed with the board it is used to evaluate different configurations of the system.

(18)

(19)

2

Theory

This chapter will introduce some of the theory behind the TDC. Starting with basic build-ing blocks of an FPGA and then continubuild-ing with a short description on the theory of the TDC and how it could be implemented. This is followed by a review of some of the char-acteristics of the TDC which could affect the delay time. The chapter ends with the data encoding of the TDC.

2.1 FPGA

FPGAs are the most common type of re-configurable logic. Re-configurable means that the hardware is constructed in such way it is possible to program the logic after fabrication [Marwedel, 2011, Wolf, 2004].

Subsection 2.1.1 below explains some of the building blocks for FPGAs. The details are based on the Xilinx 7-series FPGA and therefore there may be some differences from those of other suppliers or FPGA models.

(20)

6 2 Theory

2.1.1 Configurable Logic Block

Configurable Logic Block (CLB) is a building block that is the main logic resource and consists of two slices in each CLB. CLBs are used for implementing sequential and/or combinatorial circuits. This CLBs are connected through a switch matrix and an illustra-tion over the CLB with its conneillustra-tion is shown in fig. 2.1 [Xil, 2014a].

Switch Matrix Carry logic Carry logic SLICE(0) SLICE(1) COUT CIN COUT CIN CLB

Figure 2.1: Overview of a CLB, based on figure from [Xil, 2014a].

These two slices are not connected to each other, but with the slice that has same orientation in the CLB below and above [Xil, 2014a].

2.1.2 Slice

In Xilinx 7-series FPGAs there are some different slices that are specialized for a typical application purpose. Namely, there are two types SLICEL and SLICEM. The difference between SLICEL and SLICEM is that SLICEM have added functions for storing data and data shifting. There are between 2.6K to 305.4K slices depending on model [Xil, 2014a].

(21)

2.1 FPGA 7

In each slice there is\ are

• Four logic-function generators or look-up tables • Eight storage elements

• Wide-function multiplexers • Carry logic

The Wide-function multiplexers could be used to form a 27 input combinational func-tion or a 16:1 multiplexer in one slice and it is also possible to create even wider multi-plexer over multiple slices [Xil, 2014a].

The carry logic in each slice consists of four carry logic elements that are connected in a chain. Each element consists of one MUX and one OR-gate. Figure 2.2 illustrates this carry chain block (CARRY4) [Xil, 2014a].

xor 0 1 O D S CO xor 0 1 O D S CO xor 0 1 O D S CO xor 0 1 O D S CO 0 1 CO CI CI N I T

Figure 2.2: The CARRY4 element, based from figure from [Xil, 2014a]. The carry logic is usually used for arithmetic functions. For each carry logic that is cascaded the propagation delay increases linearly with the number of bits for the operand. The number of carry logics that could be cascaded is limited to the column height of slices on the FPGA [Xil, 2014a].

(22)

8 2 Theory

2.1.3 DSP48E1

The DSP48E1 is a slice that is suitable for DSP applications. And therefore have added functions for multiplier, accumulating pattern detection and SIMD ALU [Xil, 2014c].

2.1.4 Block RAM

Regularly there are block RAMs integrated in FPGAs. Block RAMs are RAMs that are distributed through the chip. This building block could be used to construct In, First-Out(FIFO) registers, large shift registers and ROMs. In Xilinx 7-series FPGAs there are between 25 to 1880 block RAMs with a 36 Kb size each [Xil, 2014b].

2.2 Time to Digital Converter

Time to Digital Converter (TDC) is an electronic component used for converting a time interval to a digital code. This component could be used to measure disturbance in a signal flow, for instance in the QKD mentioned in section 1.1.

2.2.1 Principle

There are different ways to implement a TDC. One way is to use a counter that counts the number of clock cycles that elapse between a start and a stop signal. The main drawback with this design is that the resolution is dependent of the highest achievable clock speed that the system is limited to.

It is possible to obtain a higher resolution by dividing each clock interval into smaller time intervals. This is usually called a tapped delay chain. An illustration how the tech-nique is implemented is in fig. 2.3 [Henzler, 2010].

DFF DFF DFF DFF DFF DFF Q[1] Q[2] Q[...] Q[n-2] Q[n-1] Q[n] Clock

Start

Figure 2.3: A basic delay chain, based on figure from [Henzler, 2010].

Here every clock interval is divided into smaller intervals by using a chain with digital elements which the signal propagates through. By analyzing fig. 2.3 we can describe the model in following equations [Henzler, 2010].

(23)

2.2 Time to Digital Converter 9 ∆Tstart = N1Tclk k −1 (2.2) ∆Tstop = N2Tclk k −2 1, 2∈[0; TLSB= Tclk k ] (2.3)

If eq. (2.2) and eq. (2.3) is inserted into eq. (2.1) we get following equations for describe the properties for a time interval [Henzler, 2010].

∆T = N Tclk+ N1 Tclk k −1−N2 Tclk k −2 (2.4) T = 2−1∈[− Tclk k ; Tclk k ] (2.5)

Where  is the resulting quantization error. An illustration of which events these

equations describe are in fig. 2.4. The key valuesN Tclk, ∆Tstart and ∆Tstop is plotted in

the figure [Henzler, 2010].

Clk Start Stop

Count 0 1 2 3

Delay line count 0 1 2 3 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

N Tclk

∆Tstart ∆Tstop

∆T

Figure 2.4: Principle for a synchronous TDC, based on figure from [Henzler, 2010]. An illustrates how the start signal propagates through a group of delay elements is in fig. 2.5. From this figure we can see that the number of delay elements needed depends on the time range that is desired to measure.

If we assume that delay elements are symmetric, then the number of delay elements could be determined by following eq. (2.6) [Henzler, 2010].

NDelaysteps= b ∆T

TLSB

c (2.6)

ThereNDelaystepsis the number of delay elements,∆T is the time period and TLSBis

(24)

10 2 Theory Start Delay element 1 . . . . . Delay elementn Stop

Figure 2.5: Operating principle of a delay chain in the TDC, the interval between the vertical lines is∆T .

2.2.2 Characteristic sources

There are multiple sources that could affect the precision and characteristics for the TDC. Some of these sources and their possible impact are described below.

Process Variations

The TDC behavior may differ due to variations in the process. Based on small structuring variations in production each logical gate can have some difference in size, which impacts the characteristics of the component [Henzler, 2010].

Furthermore, the temperature has an impact on how the TDC will perform; For in-stance, the resistance is temperature dependent. Xi and Qi have done some studies fo-cusing on how the resolution changes in different temperature [Xi and Qi, 2013]. In this thesis that has been left due to the time constraints.

Logic structure

The logical structure may differ depending on the FPGA architecture, creating some dif-ferences in path length that could cause some difdif-ferences in the delay time between each logical element. For the Zynq device there are four carry logic elements in each SLICE and the interconnection length may differ from inside and between each SLICE [Harmen and Edoardo, 2011].

Clock distribution and interconnect

In an ideal delay chain the sampling would be done simultaneously for all elements, but unfortunately that is difficult to achieve. FPGAs are usually divided into a number of different clock regions to compensate for the delay time that could occur to the clock signal in the circuit and by this minimize the clock skew [Henzler, 2010, Harmen and Edoardo, 2011, Xil, 2013].

(25)

2.3 Encoding 11

Even if there is a perfect balance in these regions it could still be unbalanced due to local process variations. This clock skew is illustrated in fig. 2.6 below with a shimming delay representing the clock skew [Henzler, 2010, Harmen and Edoardo, 2011].

DFF DFF DFF DFF DFF DFF

T Clock

Start

Figure 2.6: A delay chain with a clock skew, based on figure from [Henzler, 2010].

2.3 Encoding

There are different techniques for encoding the delay line. One technique is to count the number of ones that have been sampled by DFFs. The Technique is implemented by summing ones fromQ in fig. 2.3 / fig. 2.6 as eq. (2.7) [Claudio and Edoardo, 2009].

Delay =

n

X

i=0

Q[i] (2.7)

This technique is relatively simple to implement and fast to run because it only sum-marizes the ones. One drawback with this technique is that it could appear areas inside the chain where there are zeros instead of ones due to clock skews or other defects. This drawback could cause the encoder to have a small error rate in the encoded value [Claudio and Edoardo, 2009].

A solution to this drawback is to use another encoder that only tracks the one that have propagated farthest. This technique would not be affected by bubbles of zeros inside the sampled value from the delay chain but creates a more computational intense encoder [Claudio and Edoardo, 2009].

(26)

(27)

3

Design and implementation

This chapter reviews the design and its implemention. Starting with an overview of the hardware that the system is implemented on. After the overview the chapter proceeds to go through the different parts in the design. This chapter ends with an overview of the software that have been added to run the design.

3.1 Hardware

During the implementation of the system, an FPGA development board named Zedboard was used. This board is based on Xilinx Zynq™-7000 All programmable SoC. The Zynq SoC is a combined dual-core ARM®Cortex™A9 processing system (PS) and pro-grammable logic (PL) on the same chip made with 28nm technology [Xil, 2013].

FPGAs are ideal for experimental and prototype development due to the low cost for small series and simultaneously flexibility in implementation and the ability to make upgrades and corrections further on [Xil, 2013].

Some features on the development board • Xilinx XC7Z020-1CLG484C • Memory – 512 MB DDR3 – 256 MB Quad-SPI Flash • Interfaces – USB-JTAG 13

(28)

14 3 Design and implementation

– USB 2.0 FS USB-UART Bridge

– Five Digilent Pmod™compatible headers

The system is implemented in the PL part of the Zynq and transmits the data to PS through the AXI-bus protocol. There the software is executed, which handles some simple data operations and transmissions through the UART-bridge.

The clock frequency for this system is set to 100 MHz,but it is possible to use higher clock speeds if necessary, see Xil [2013].

3.2 System

The top design consists of a communication module, a control module and a TDC module for each channel to be implemented. The TDC system is of a hybrid type which is a combined tapped delay chain and a clock counter. An illustration of the top level design with four receiver channels is shown in fig. 3.1.

AXI Control-unit ARM TDC1 TDC2 TDC3 TDC4 Start1 Start2 Start3 Start4

Figure 3.1: The top level overview.

For implementation a hardware description language called SystemVerilog have been used. This language is an extension of Verilog-2005 that has a C like syntax. The main advantage with SystemVerilog versus other hardware description languages like VHDL and classical Verilog, is added support of object-oriented programming techniques that makes it easier to develop test-benches.

The system is designed to be adaptable in such a way that it is possible to adjust the system to desirable channels by adding the desired number of TDC modules and do minor changes to the control-unit.

(29)

3.2 System 15

Each TDC module work separately and the signal need to be one clock cycle to be sure that the system will detect it. Also the time between each signal need to be at least one clock cycle. If the system is running at 100 MHz provides that these time periods need to be at least 10 ns.

However, the total speed of the system is limited by the data transfer rate through the UART-bridge. In addition, the address width sets a limitation on the number of modules that is possible to communicate with in the implementation. The available logic does also set limits on the number of modules that is possible to implement.

3.2.1 TDC

In each TDC module there is an encoder, a counter, a FIFO-register and two carry-chains. An illustration on how these parts are connected and how data is flowing through the TDC module is shown in fig. 3.2.

Delay line Encoder Counter FIFO Start Start _Start

Data (from AXI) Address (from AXI) Control signals (from control unit)

Read/Write

Figure 3.2: Overview of one TDC module, in addition to the signals that are in-cluded in the figure are reset and clk going to all modules.

In the delay chain, there are two fast forward carry chains implemented with CARRY4 primitive, illustrated in fig. 2.2. By using this primitive, the synthesis tool is forced to implement this chain in a specific pattern; This results in a chain with a minimized and more predictable path. A demonstration of not using the primitive is in fig. 3.3.a and one that uses the primitive in fig. 3.3.b. In the first figure we can see that the delay elements are spread out on the circuit with no direct pattern. While in the second figure they are lined up with a pattern that is desired to this application.

The two carry chains in each TDC module enables the possibility to receive signals at the same speed as the hardware’s clock speed. Each of this carry chains has 600 carry elements, which is the maximum number of elements that fits in a column. The top of the column are connected to the bottom of the next column which results in a larger delay when the signal propagates to the next column.

(30)

(a) A carry-chain that do not have the correct route placement. This is a print-screen from PlanAhead

(b) A carry-chain that have the correct route placement. This is a print-screen from PlanAhead

(31)

3.2 System 17

During run mode a one is propagated through the delay chain if there is no start signal on the channel, and if there is a start signal on the channel a zero will be propagated through the delay chain. While the system is running, one chain is active for measure and simultaneously the other one is sampled and restored. In the next clock cycle the chains shift between these two states.

During each detected start signal the encoder takes two sampled values from the delay chain, ∆TstartandClk − ∆Tstopfrom fig. 2.4. First sampled value,Clk − ∆Tstopthat is

the time interval between the latest positive clock edge and the positive edge of the start signal, which is a indication of the end of transfer. Second sampled value, ∆Tstart that

is the time interval from the negative edge of the start and the latest positive clock edge, which is a indication of start of transfer.

∆TstartandClk − ∆Tstopis decoded by counting ones in the sampled value and using

this for address in a storage register. These two values are then added together, for each transfer, with the value from the counter which counts the number of clock cycles between each start signal, like the equation eq. (2.4).

After summation the value is sent to the FIFO that is implemented with the FIFO18E1 primitive. This primitive uses the built in block RAM to make a FIFO and could be configured to have a data width on 4, 9, 18 or 36-bits with a total size of 18 Kb-bits. For this system a data width of 36-bits was used, that gives a data depth on 512-bits.

The value is stored in the FIFO until the value is requested by PS through the AXI-bus. If the FIFO becomes full will the encoder overwrite the last value, that will result in data loss.

The FIFO and carry chain are not generated by Xilinx tools and should be replaced with a similar primitive for other vendors or another primitive that is more suitable for the desired application.

The decoding is during synthesis possible to change to a trace technique, that instead of summation trace the first one and the last one to estimate the propagation value.

3.2.2 Control module

This module is implemented with one finite state machine (FSM) that is keeping track of which state the system have, i.e. idle, initiate, calibration or run time. The FSM is a Moore machine i.e. its output is only dependent on the current state. In fig. 3.4 is an overview over the FSM in the control module with its states, jumps and condition for jumps. This module also houses the control status register, which is a register that has fundamental information about which conditions the system have.

(32)

18 3 Design and implementation Run Idle Calibrate Initiate CSR[4]=0 CSR[4]=1 CSR[0]=1 CSR[1]=1 CSR[1]=0 CSR[0]=0 Start

Figure 3.4: The FSM in the control module. The jumps depend on the Control Status Register (CSR).

Initiate

This state is the initial state which is running during start-up and sets the start-up prefer-ence for the system. It is also possible to force system to this state by setting the control status register.

Calibration

The state for calibrating the TDC module if the hardware calibration is implemented dur-ing synthesis.

The calibration is based on that∆T is known and N is measurable for the system, and

use these two values in eq. (3.1) that is a rewritten equation of eq. (2.6).

TLSB=

∆T

N +  (3.1)

This equation gives the average delay value for a delay element by using the known value of∆T , the clock cycle time, divided with the number of elements that the signal

have propagated through.

The average value is used to calculate the product of each possible encoder case and written to the storage register in the encoder. When calibration is done for each channel the calibration flag in the status register is unset and the FSM jump to idle state.

Runtime

In this state the system is activated and is measuring the time interval between each one of the receiver channels. The measured values is stored in the FIFO and will be sent to the ARM-core at request.

(33)

3.3 Software 19

3.2.3 AXI-int module

This module is handling the read and writes to the AXI-bus which is a multiple master and multiple slave bus. The AXI-interface consists of five channels [ARM, 2011, Xil, 2012]

• Read Address Channel • Write Address Channel • Read Data Channel • Write Data Channel • Write Response Channel

The Zynq processor uses the second version of AXI, AXI4. The AXI4 have three types of interfaces: AXI4, AXI4-Stream and AXI4-Lite.

The properties of each interface is that:

• The AXI4 has data burst support and traditional memory mapped address and data interface.

• The AXI4-Stream have data-only burst.

• The AXI4-Lite have single data cycle only and traditional memory mapped address and data interface.

The AXI-int module is in the PL part of the Zynq and communicates with the AXI interconnections. For this application the AXI4-Lite is used due to that it is simple to implement and use and the ability to transmit data at a higher rate, compare to the UART-bus, to the ARM-core. In this design the PS act as a master and actively request data from the TDC that is connected as a slave. The module is based on a design from ISY.

3.3 Software

Some simple and practical functions are written in C to ease the communication between the system and the computer connected through UART. Some functions are for initiating the system or setting a specific state for the system. There are also some functions for reading values from the system besides the measured value, and it is also possible to write to the state register and the registers in the encoder.

(34)

(35)

4

Results

This chapter starts with presenting the setup for testing and results from these tests during this thesis. Starting with an overview of the software test bench followed with a descrip-tion on the hardware test bench. The chapter ends with presenting the result from these test benches.

4.1 Software Test set-up

During the implementation phase, a software test bench is used to verify the functionality of the system. A basic sketch of the outline is shown in fig. 4.1.

This test bench is written with SystemVerilog and simulated in Modelsim that is a simulation environment for multiple HDL languages. This simulation environment and test bench is also practical during hardware evaluation during search for error sources.

Other software that is used during evaluation is Xilinx PlanAhead for verifying struc-ture, placement and that correct logic elements are used after synthesis. As an example of how it is useful we could study fig. 3.3.a and fig. 3.3.b where the placement is essential for the TDC function.

4.2 Hardware Test set-up

To simplify the difficulty in making some estimates and measurements, there are some registers added to the design. These registers check for the maximum reach of the start signal during one clock cycle for each of the eight carry chains. To be able to do a functional test and verify control functions in the system a pattern generator (Tektronix TLA7PG2) is used for generating the start signal for each channel that is implemented.

(36)

22 4 Results

DUT PG

Reader

T estbench T estmanager

(37)

4.3 Measurements 23

4.3 Measurements

To estimate the precision of the system it is possible to trace the propagation through the carry elements for a given time. During testing the system clock, 100 MHz, is used as a reference time, and during one clock cycle did the signal propagate through more than 500 carry steps. This is based on a single channel design, no internal calibration and surrounding logic at a minimum.

By using eq. (2.6), the resolution is estimated to less than 20 ps, that complies with the need of 100 ps for the QKD to work. This is a simplification due to limitation in time. The Zynq SoC has carry accelerators that speed-up the propagation rate over time. This acceleration has to be measured to get a estimation of the impact of the speed-up.

To achieve this resolution the carry chain must be aligned in a correct way with a short path between each element, if not, the resolution will reduce drastically. For instance the carry chain in fig. 3.3.a only has a resolution at 0.25 ns while a chain with line up like fig. 3.3.b could have a resolution less than 20 ps for a single channel.

The placement of the carry chains is not locked due to time limits. The number of carry steps the signal propagates through is dependent on the placement of the carry chain and possibly change between each synthesis, due to the possibility to small shifts in sur-rounding logic.

The resolution decreased when more channels were added, which is possible to see in for instance in fig. 4.2.

During measurements it is possible to see the effects of the differences, mention in sub-section 2.2.2, in the chip. These effects could be observed by comparing the difference of maximum propagated carry elements between the various channels on the same imple-mentation. An example of the difference between maximum and minimum propagation for number of channels and design is shown in fig. 4.2.

The extra calibration hardware in the plot is logic that has been added and used during calibration instead of functions running in PS.

The plot illustrates the maximum and minimum value of propagation through the delay line during one clock cycle in a system running at 100 MHz. These values give the expected range of resolution that is possible to obtain from each configuration.

(38)

24 4 Results

Max without extra calibration hardware Max with extra calibration hardware Min without extra calibration hardware

Min with extra calibration hardware

1 channel 2 channels 4channels

400 450 500 Channels Propag ation [Number elements]

Figure 4.2: A comparison between maximum and minimum propagation values depending of number of channels and design.

An estimation of the resolution that is possible to obtain from a four channels imple-mentation with eight carry chains are shown in table 4.1.

Channel 1 Channel 2 Channel 3 Channel 4 Line 1 Line 2 Line 1 Line 2 Line 1 Line 2 Line 1 Line 2 Steps 439 394 420 376 421 380 437 393

≈_{Delay(ps)/step} 22.8 25.4 23.8 26.6 23.8 26.3 22.9 25.4

Table 4.1: Differences in maximum propagation value between carry chains and the approximate value on delay time between each step.

To examine the fluctuation on the propagation, a test with multiple readings is done. This test is done in similarly as previously tests, by measuring the propagation during one clock cycle through the delay line. A sample from one of this test is shown in table 4.2. This also appears in a plot in fig. A.1.

(39)

4.3 Measurements 25

Read order Channel 1 Channel 2 Channel 3 Channel 4 Line 1 Line 2 Line 1 Line 2 Line 1 Line 2 Line 1 Line 2 1 read 442 394 420 375 424 380 434 390 2 read 439 390 420 376 424 380 434 394 3 read 439 394 418 373 424 380 434 392 4 read 441 391 418 374 422 380 434 394 5 read 442 397 420 376 424 382 430 394 6 read 437 394 420 373 421 380 434 394 7 read 442 394 420 376 420 380 433 390 8 read 442 390 418 376 422 380 432 394 9 read 438 397 420 376 421 380 435 398 10 read 442 397 420 376 422 380 433 394 ≈_{Delay(ps)/step} _22.7 _25.4 _23.8 _26.7 _23.7 _26.3 _23.1 _25.4

Table 4.2: Differences between some start signals.

This test does show a fluctuation of the signal propagation through the delay chain dur-ing run mode. However it does not exceed the requirements, as shown inDelay(ps)/step

that is calculated as at eq. (4.1) in table 4.2.

Delay(ps)/step = ∆Tclk ( PNread i=1 read[i] Nread ) (4.1)

There the clock cycle time is ∆Tclk, number of readings is Nread and the read-out

value isread[i].

Tests with a configuration with a simple internal calibration resulted in a difference in size of precision and fluctuation. Table 4.3 shows the propagation in the design with the added logic. This also appears in a plot in fig. A.2.

Read order Channel 1 Channel 2 Channel 3 Channel 4 Line 1 Line 2 Line 1 Line 2 Line 1 Line 2 Line 1 Line 2 1 read 406 474 410 468 394 468 408 474 2 read 410 474 410 468 394 470 408 474 3 read 408 475 408 468 394 474 410 474 4 read 410 476 410 468 394 470 409 474 5 read 406 476 412 468 394 468 408 476 6 read 408 476 411 466 393 470 407 474 7 read 406 476 410 467 390 470 408 474 8 read 409 476 410 466 398 470 408 474 9 read 406 476 412 468 398 472 407 474 10 read 406 476 410 466 394 473 406 474 ≈_{Delay(ps)/step} 24.5 21.0 24.4 21.4 25.4 21.3 24.5 21.1

Table 4.3: Differences in propagation between some runs with some added logic for a hardware calibration.

(40)

26 4 Results

A collection of plots for the calculated maximum- and minimum values for variance is shown in fig. 4.3. This is a collection of 10 value readings, during 10 restarts of the system, that give a total number of 100 measured values.

Max without extra calibration hardware Max with extra calibration hardware Min without extra calibration hardware

Min with extra calibration hardware

1 channel 2 channels 4channels

0 5 10 Channels Propag ation v ariation [Number elements]

Figure 4.3: A comparison between maximum and minimum propagation variance depending on number of channels.

The maximum value on the variance is about 10 steps for the four channels configura-tion which is approximately two to three percent of the total propagaconfigura-tion value.

There has also been some testing with an alternative encoder that instead of counting the ones trace the first one. Some values gain from this measurement is in table 4.4 with values from the summation encoder as a comparison.

Summation encoder Trace encoder

Min Max Min Max

Variance 0 7.73 0 23.51

Median 374 440 368 495

Table 4.4: A collection of maximum and minimum values for the trace encoder As previous test this is a result from 10 value readings, during 10 restarts of the system, that give a total number of 100 measured values.

(41)

4.4 Resource usage 27

4.4 Resource usage

An overview of how much logic that been used by the design table 4.5.

Resource Used Available Utilization

Registers 5702 106400 5% LUTs 14576 53200 27% Slices 4384 13300 32% FIFO18E1 4 140 1% IOs 15 200 7% BUFGs 1 32 3%

Table 4.5: Resource utilization on the FPGA for four channels TDC (part xc7z020clg484-1).

Some of this logic is combined e.g. there are LUTs in Slices and so on, but it give an overview on which resources that are used for this application. BUFGs are global buffers that usually are used to suppress the clock skew between logical domains or quick access for control signals.

Unfortunately it is a bit problematic to obtain an estimation on the amount of these resources that have been used for each part of the design. This is because the tool do not always keep the hierarchy intact due to optimization. However, some parts were possible to estimate the size from by reading the MRP-file, a report file from PlanAhead, and some others could be estimated by hand. A collection of these values is in table 4.6.

Resource Encoder[0] Encoder[1] Encoder[2] Encoder[3]

Registers 181 148 148 148

LUTs 2490 2353 2351 2340

Slices 862 735 786 795

Resource Delay line FIFO

Registers 306 0

LUTs 1206 1

Slices 1202 1

FIFO18E1 0 1

(42)

28 4 Results

Table 4.7 demonstrate the size with added calibration logic and the difference in per-cent against the system without calibration logic.

Resource Used Available Utilization Difference

Registers 5931 106400 5% 3.86% LUTs 15418 53200 28% 5.46% Slices 4492 13300 33% 2.4% FIFO18E1 4 140 1% 0% DSP48E1 8 220 3% 100% IOs 15 200 7% 0% BUFGs 1 32 3% 0%

Table 4.7: Resource utilization on the FPGA for four channels TDC with calibration and the size difference.

From table 4.7 it is possible to see that despite the added logic to the system, still does not utilizes more than a third of the available logic.

However, with a trace encoder does the usage of LUTs/Slices doubles compared to previous design, which is possible to see in table 4.8.

Resource Used Available Utilization

Registers 6408 106400 6% LUTs 29878 53200 56% Slices 8810 13300 66% FIFO18E1 4 140 1% IOs 15 200 7% BUFGs 1 32 3%

Table 4.8: Resource utilization on the FPGA for the four channel TDC with trace encoder

(43)

4.5 Error sources 29

4.5 Error sources

In this thesis there are some simplifications. One of these simplifications is that no analy-sis of the carry accelerator in the carry logic could affect the propagation time and give it a nonlinear character.

There are also skew in the clock signal that has not been considered in this thesis which could have an impact on the actual results. Through the timing report in the synthesis tool it is possible to obtain that the clock uncertainty is approximately 0.035 ns and the clock path skew is summarized in table 4.9.

Number of channels 1 2 4

Without calibration 0.019 0.002 0.067 With calibration 0.060 0.030 0.007

Table 4.9: A table over the clock path skew in ns, that was retrieved trough the timing report in synthesis

In the Timing report the maximum data path time forCI N to COU T in the CARRY4

element,TBY P, is estimated to 0.114-0.117 ns.

During testings there is no control or compensation of the temperature on the chip, which could effect the delay time. There where a simple monitoring by using Chipscope. During testing did the temperature fluctuate in a span less than two Celsius for all imple-mentations, however did the chip got hotter for each increase of channel and size.

(44)

(45)

5

Discussion

Through testing and evaluating it is found that the placement of carry chain cells is crucial, where differences in precision is dependent of where the carry chain was placed.

A finding is that the TDC achieve higher resolution if less channels are implemented on the chip. This could be because there is less logic around that interferes with the signals.

The number of unknown parameters and their affect on the delay time, makes it dif-ficult to estimate the correct value by calculations in an FPGA. It is however possible to measure the delay time and parameters during testing.

Although there are a number of parameters that effect the delay time through delay elements it will not appear to cause an excessive fluctuation in propagation time. In this system it is not compensated for temperature changes which could have an impact on fluctuations.

In fig. A.1 and fig. A.2, in appendix A there are possible to see that the signal prop-agation does change between readings.This could be cause by a combination of clock uncertainty inside the FPGAs logic and other affecting sources such as interference of the surrounding logic.

The size of clock uncertainty is between 1 and 2 delay step, which is the most common fluctuation size as it is possible to see in table 4.2 and table 4.3. It is also possible to see a tendency of increase propagation variance when more logic that are utilized, see fig. 4.3 and table 4.4.

The system achieve the resolution on 20 - 30 ps and although the system does not have the same resolution at 10 ps as [Harmen and Edoardo, 2011], it is within the

(46)

32 5 Discussion

quired resolution for the given application, and despite the system is implemented on a lower performer hardware than Virtex-series FPGA. However there is no compensation for errors depending on clock skews and temperature variations.

[Harmen and Edoardo, 2011] have conclusions that the surrounding logic influences the TDC and demonstrations on benefits of having some guarding slices around the carry chain. A similar procedure could also be tested on this system and probably improve the system. Based on that surrounding logic affect performance.

The hardware based calibration is not recommended to be synthesized in the current state due to the decrease in resolution. Instead it is recommended to use calibration through a software algorithm. This due to that the hardware version tends to increase the fluctuation of propagation time in a many channel implementations.

(47)

6

Conclusions

The placement and implementation of the carry chain have a high impact on the resolution that is possible to obtain through the chip. Also the implemented logic around the carry chain has an impact on its activity even if they do not have a direct link to the carry chain during the operation.

It does not require a large amount of logic to implement a TDC. This system could also be implemented on a smaller Zynq than the on been used in this thesis like the Z7010 that is used in the smaller development board like MicroZed or PicoZed.

Even if it is possible to add many modules and functionality it should be used spar-ingly. Given to the decrease in accuracy it gives in the TDC for each added module.

(48)

(49)

7

Future work

• One issue that need to be investigated is to determine how the design and technology are dependent of temperature.

• The delay line could be locked to a specified place in every synthesis. It is also possible to lock the routing so no undesired connection to the carry chain arrives during future change. An estimation of the carry accelerations should be made for estimating a more accurate decoding of the delay line. This could give a deviation from the actual propagation time due to the difference from the linear propagation which is used in this work.

• Create a calibration function that could compensate for skew between delay ele-ments. Possible by using the internal PLL and feed the delay line with a 2 GHz signal during 20-100 ns and with this determinate where change between one and zero are. By using this we could see if there is sections inside the delay line that have a different propagation speed.

• For the system to co-work in a QKD it is needed to add a module for making time stamps on the measured value.

(50)

(51)

Bibliography

AMBA AXI and ACE Protocol Specification. ARM, www.AMBA.com, d edition, Octo-ber 2011. ID102711. Cited on page 19.

Favi. Claudio and Charbon. Edoardo. A 17ps time-to-digital converter implemented in 65nm fpga technology. FPGA’09, Febrary 22-24, Monterey, California, USA, pages 113–120, 2009. Cited on page 11.

Favi. Matthew W Fishburn. Student Member IEEE Harmen, Menninga. Claudio and Char-bon. Sr Member IEEE Edoardo. A multi-channel, 10ps resolution, fpga-based tdc with 300ms/s throughput for open-source pet applications. IEEE Nuclear Science Sympo-sium Conference Record, (31-2):1515–1522, 2011. Cited on pages 10, 11, 31, and 32. Stephan Henzler. Time-to-digital converters. Springer, 2010. Cited on pages 8, 9, 10,

and 11.

Peter Marwedel. Embedded System Design - Embedded Systems Foundations of Cyber-Physical Systems. Springer, second edition, 2011. Cited on page 5.

Nielsen. Michael, A and Chuang. Isaac, L. Quantum Computation and Quantum Infor-mation. Cambridge, 2000. Cited on pages 1 and 2.

Liao. Shubin Liu. Jinhong Wang. Qi, Shen. Shengkai and Liu. Weiyue. An fpga-based tdc for free space quantum key distribution. IEEE Transations on Nuclear Science, 60 (5):3570–3577, 2013. Cited on page 1.

Wayne Wolf. FPGA-Based System Design. Prentice Hall Modern Semiconductor Design Series. Prentice Hall, 2004. Cited on page 5.

Feng. Deliang Zhang. Bin Miao. Lei Zhao. Xinjun Hao. Shubin Liu. Xi, Qin. Changqing and An. Qi. Development of a high resolution tdc for implemention in flash-based and anti-fuse fpgas for aerospace application. IEEE Transaction on Nuclear Science, 60(5): 3550–3556, 2013. Cited on page 10.

AXI Reference Guide. Xilinx, www.xilinx.com, v14.3 edition, November 2012. UG761. Cited on page 19.

Zynq-7000 All Programmable SoC Overview. Xilinx, www.xilinx.com, v1.6 edition, De-cember 2013. DS190. Cited on pages 10, 13, and 14.

(52)

38 Bibliography

7 Series FPGAs Configurable Logic -User Guide. Xilinx, www.xilinx.com, v1.6 edition, August 2014a. UG474. Cited on pages 6 and 7.

7 Series FPGAs Memory Resources - User Guide. Xilinx, www.xilinx.com, v1.11 edition, November 2014b. UG473. Cited on page 8.

7 Series DSP48E1 Slice - User Guide. Xilinx, www.xilinx.com, v1.8 edition, November 2014c. UG479. Cited on page 8.

(53)

(54)

(55)

A

Plots

This section present two plots that are results of a test with ten read-outs for each channel, from two different designs. Both with four input channels and a summation encoder, but fig. A.2 have more logic for a hardware calibration. The time period between each read-out is one clock cycle at 100 MHz.

2

4

6

8

10

380

400

420

440 Read order

Propag

ation

[Number

elements]

Channel 1.1

Channel 1.2

Channel 2.1

Channel 2.2

Channel 3.1

Channel 3.2

Channel 4.1

Channel 4.2

Figure A.1: Propagation without extra calibration hardware.

(56)

42 A Plots

2

4

6

8

10

400

420

440

460

480 Read order

Propag

ation

[Number

elements]

Channel 1.1

Channel 1.2

Channel 2.1

Channel 2.2

Channel 3.1

Channel 3.2

Channel 4.1

Channel 4.2

(57)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet — eller dess framtida ersättare — under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forsk-ning och för undervisforsk-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösning-ar av teknisk och administrativ lösning-art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den om-fattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet — or its possible replace-ment — for a period of 25 years from the date of publication barring exceptional circum-stances.

The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for his/her own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copy-right cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/

Adaptive TDC : Implementation and Evaluation of an FPGA

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Adaptive TDC

Implementation and Evaluation of an FPGA

Adaptive TDC

Implementation and Evaluation of an FPGA

Examensarbete utfört i Elektronik

vid Tekniska högskolan vid Linköpings universitet

av

Abstract

Abbreviations

List of Figures

List of Tables

Contents

1

Introduction

1.1

Background

1.2

Problem-formulation and limitations

1.3

Method

2

Theory

2.1

FPGA

2.1.1

Configurable Logic Block

2.1.2

Slice

2.1.3

DSP48E1

2.1.4

Block RAM

2.2

Time to Digital Converter

2.2.1

Principle

2.2.2

Characteristic sources

2.3

Encoding

3

Design and implementation

3.1

Hardware

3.2

System

3.2.1

TDC

3.2.2

Control module

3.2.3

AXI-int module

3.3

Software

4

Results

4.1

Software Test set-up

4.2

Hardware Test set-up

4.3

Measurements

4.4

Resource usage

4.5

Error sources

5

Discussion

6

Conclusions

7

Future work

Bibliography

A

Plots

2