Utilizing FPGAs for data acquisition at high data rates

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Utilizing FPGAs for data acquisition at high data

rates

Examensarbete utfört i Electronics vid Tekniska högskolan i Linköping

av

Mats Carlsson

LITH-ISY-EX--09/4298--SE

Linköping 2009

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Utilizing FPGAs for data acquisition at high data

rates

Examensarbete utfört i Electronics

vid Tekniska högskolan i Linköping

av

Mats Carlsson

LITH-ISY-EX--09/4298--SE

Handledare: Supervisor

Rashad Ramzan, isy, Linköpings universitet

Examinator: Examiner

Christer Svennson, isy, Linköpings universitet

(4)

(5)

Avdelning, Institution

Division, Department

Division of Automatic Control Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

Datum Date 2009-03-27 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version http://www.control.isy.liu.se http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-17820 ISBN — ISRN LITH-ISY-EX--09/4298--SE

Serietitel och serienummer

Title of series, numbering

ISSN

—

Titel

Title

Användning av FPGAer för insamling av höghastighetsdata Utilizing FPGAs for data acquisition at high data rates

Författare

Author

Mats Carlsson

Sammanfattning

Abstract

The aim of this thesis was to configure an FPGA with high speed ports to capture data from a prototype 4 bitP∆ analogue-to-digital converter sending data at a rate of 2.4 Gbps in four channels and to develop a protocol for transferring the data to a PC for analysis. Data arriving in the four channels should be sorted into 4 bit words with one bit taken successively from each of the channels. A requirement on the data transfer was that the data in the four channels should arrive synchronously to the FPGA. A Virtex-5 FPGA on a LT110X platform was used with RocketT MIO GPT transceivers tightly integrated with the FPGA logic. Since the actual DUT (Device Under Test) was not in place during the work, the transceivers of the FPGA were used for both sending and receiving data. The transmission was shown to be successful for both eight and ten bit data widths. At this stage a small skew between the data in the four channels was observed. This was solved by storing the information in separate memories, one for each of the channels, to make possible to later form the 4 bit words in the PC (MatLab). The memories were two port FIFOs writing in data at 240 MHz (10 bit data width) or 300 MHz (8 bit data width) and read out at 50 MHz.

Nyckelord

(6)

(7)

Abstract

The aim of this thesis was to configure an FPGA with high speed ports to capture data from a prototype 4 bit P ∆ analogue-to-digital converter sending data at a rate of 2.4 Gbps in four channels and to develop a protocol for transferring the data to a PC for analysis. Data arriving in the four channels should be sorted into 4 bit words with one bit taken successively from each of the channels. A requirement on the data transfer was that the data in the four channels should arrive synchronously to the FPGA. A Virtex-5 FPGA on a LT110X platform was used with RocketT MIO GPT transceivers tightly integrated with the FPGA logic.

Since the actual DUT (Device Under Test) was not in place during the work, the transceivers of the FPGA were used for both sending and receiving data. The transmission was shown to be successful for both eight and ten bit data widths. At this stage a small skew between the data in the four channels was observed. This was solved by storing the information in separate memories, one for each of the channels, to make possible to later form the 4 bit words in the PC (MatLab). The memories were two port FIFOs writing in data at 240 MHz (10 bit data width) or 300 MHz (8 bit data width) and read out at 50 MHz.

Sammanfattning

Syftet med examensarbetet var att konfigurera en FPGA med höghastighetsportar så att data från en prototyp av en 4 bitars Σ∆ analog-till-digital omvandlare kan samlas in med en hastighet av 2.4 Gbps i var och en av fyra kanaler och att utveckla ett protokoll för överföring av dessa data från FPGAn till en PC för analys. Insamlade data ska sorteras i 4 bitars ord med en bit successivt tagen från var och en av kanalerna. Ett krav på dataöverföringen var att data i de fyra kanalerna skulle anlända synkront till FPGAn. En Virtex-5 FPGA på en LT110X plattfrom användes med RocketIO GTP transceivrar tätt integrerade med FPGA logiken. Då utrustningen som skulle testas inte var tillgänglig under tiden arbetet utfördes användes FPGAns transceivrar till att både sända och ta emot data. Överföring av data med både 8 och 10 bitars datavidd uppnåddes framgångsrikt. Data i de fyra kanalerna visade sig dock inte anlända synkront till mottagaren. Detta problem löstes genom att lagra informationen i separata minnen, ett för varje kanal, överföra data från minnena till PCn och där med hjälp av MatLab sortera dem till 4 bitars ord. Som minnen användes tvåportars FIFOn där data skrivs in med en hastighet av 240 MHz (10 bitars datavidd) eller 300 MHZ (8 bitars datavidd) och läses ut med en hastighet av 50 MHz.

(8)

(9)

Acknowledgments

I would like to thank Christer Svensson for giving me the opportunity to do this interesting thesis work and also for leading me in writing the report. I thank my Supervisor Rashad Razam for all help and support during my work with the thesis, Anton Blad for help in the lab and my mother Gudrun Alm Carlsson for proofreading the manuscript. Finally, I want to thank my fiancée Kajsa Tibell for support and letting me work late in the evenings and my son Darrell for giving me opportunities to rest and forget about the work.

(10)

(11)

2.3.1 GTP Transmitter (TX) . . . 11 2.3.2 GTP Receiver (RX) . . . 13 2.3.3 Shared PMA PLL . . . 15 2.3.4 Clock domains . . . 17 2.4 Development board . . . 18 2.4.1 Superclock module . . . 19 2.4.2 Serial interface (RS232) . . . 20 2.5 ISE . . . 21 3 Implementations 23 3.1 The New project Wizard . . . 23

3.2 The RocketIO GTP Transceiver Wizard . . . 25

3.2.1 Generated files . . . 26

3.2.2 Clock Connections . . . 27

3.3 Loopbacks . . . 28

3.4 Near-End PCS Loopback . . . 29

3.4.1 PRBS . . . 30

3.4.2 Own produced data . . . 31

3.5 Near-End PMA Loopback . . . 32

3.5.1 Comma detect . . . 33

(12)

3.6 Far-End PMA Loopback . . . 34

3.7 Structure of each project . . . 34

3.8 Sending and receiving data . . . 34

3.8.1 Sending and receiving data over one link . . . 35

3.8.2 Sending and receiving data over two links . . . 35

3.8.3 Sending and receiving data over four links . . . 35

3.9 Protocol between PC and FPGA . . . 37

3.9.1 Prerequisite for creating the protocols . . . 37

3.9.2 Communication between PC and FPGA . . . 39

3.9.3 Protocol for eight bit data over one line . . . 40

3.9.4 Protocol for ten bit data over one line . . . 41

3.9.5 Protocol for eight bit data over 4 lines . . . 42

3.9.6 Protocol for ten bit data over 4 lines . . . 46

3.10 MatLab . . . 48

3.11 Channel bonding . . . 48

3.12 Two dual tiles . . . 49

4 Results 55 4.1 Design of the transceivers . . . 55

4.2 Design of the FIFO registers . . . 59

4.3 MatLab . . . 60

4.4 Design of the protocol . . . 60

4.5 Complete solution . . . 61

5 Discussion 63

6 Future work 67

7 Conclusion 69

(13)

Chapter 1

Introduction

1.1 The task

The main purpose of this thesis is to make possible that data from a Σ∆ analog-to-digital converter (in the following called Σ∆-converter) over four channels at 2.4 Gbps can be received by an FPGA (Field Programmable Gate Array) and subsequently transmitted to a PC (Personal Computer) for analysis. The data in the four channels form 4 bit words read in the vertical direction, see Figure 1.1. These 4 bit words are the ones to be analyzed in the PC.

A subproject was to see if ten bit words could be received and fed into the FPGA logic. This would increase the possibilities in what way the data can be received (see Background). The chip with the Σ∆-converter was not available during the time the thesis work was going on. Instead, the transmitter of the transceiver in the FPGA was used for sending data.

Virtex 5

PC

2.4Gbps

1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 0 1 0 1 0 1 0 1 1

Figure 1.1. Block diagram illustrating the task of this work.

At this point the idea was to receive the data synchronously in the four receivers at the dual tiles, combine the data into four bit words in the FPGA logic and build a protocol in order to transfer these four bit words to the PC for analysis.

(14)

A Virtex-5 FPGA on the LT110X platform was available. This platform contains RocketIO GTP (Giga Transceiver Peripheral) transceivers closely integrated with the FPGA logic.

1.2 Structure of the GTP transceiver

Figure 1.2 shows a block diagram of the transceiver containing one transmitter and one receiver. Data are transmitted and received serially. In the receiver, the data are first parallelized in the SIPO (Serial In, Parallel Out) before being transferred to the FPGA logic. Data sent from within the FPGA (PRBS (Pseudo Random Bit Sequence) generator or FPGA logic) arrives in parallel at the PISO (Parallel In, Serial Out) where they are serialized before being sent. In this project, serial data are transmitted and received at a rate of 2.4 Gbps. The rates of the parallel clocks inside the transceiver are 240 MHz (sending and receiving 10 bit data) or 300 MHz (sending and receiving 8 bit data). Most of the FPGA logic, driven by a separate clock oscillator on the board (see Background), works at a rate of 50 MHz.

NEAR END PCS-LOOPBACK NEAR END PMA-LOOPBACK FAR END PMA-LOOPBACK

Figure 1.2. Block diagram over the structure of the transceiver [6]. The three loopbacks

used in this work are indicated (see section 1.3).

Clocking architecture

Each transceiver has a shared PLL (Phase Locked Loop) fed by a reference clock. The PLL provides both the transmitter and the receiver with two clocks, one serial and one parallel. It also provides a second transceiver with serial and

(15)

1.3 Tests used to understand the functioning of the transceiver 3

parallel clocks, which explains why it is called a shared PLL (shared between two transceivers forming a dual tile). The parallel clocks produced by the PLL are called XCLK in the block diagram (see Figure 1.2). The clocks in the ’PCS paral-lel clock’ section and the ’FPGA paralparal-lel’ section (the user clocks) of the receiver and transmitter are sourced either by the XCLK in the PMA parallel section or by an independent reference clock (REFCLKOUT).

1.3 Tests used to understand the functioning of

the transceiver

To understand how the transceiver works, three out of four loopbacks suggested by Xilinx were used. The first loopback, the ’Near-end PCS (Physical Coding Sub-layer) loopback’, tests the parallel data section. In doing this loopback, the XCLK at the transmitter side (TXOUTCLK) was synchronized with the RXUSERCLK, since XCLK (RXRECCLK) at the receiver side is not necessarily in phase with the TXOUTCLK, see figure 1.3.

TXOUTCLK DATA RXRECCLK

RXUSRCLK

Figure 1.3. The figure shows how the synchronization works.

The second loopback, the ’Near-end PMA (Physical Medium Attachment) loopback’, also tests the section where the data is serialized. The third loop-back, the ’Far-end PMA loopback’, includes the whole transceiver sending and receiving the data at 2.4 Gbps. The loopbacks are indicated in the block diagram of the transceiver in figure 1.2.

1.4 Communication between the FPGA and the

PC - development of a protocol

Received data should be sent to the PC for analysis using the serial interface (RS (Recommended Standrad) 232). MatLab was chosen for the communication on the PC side. The MatLab program should communicate with the FPGA via

(16)

a protocol (describing how data in and out from the FPGA is handled). For this communication a UART (Universal Asynchronous Receiver/Transmitter) is needed to convert received serial data to parallel data and inversely for data which is to be sent. Since there was no UART available on the FPGA board, a UART had first to be implemented in the FPGA.

The protocol was developed in several steps. The first protocol was devel-oped in order to find out how the communication between the FPGA and the PC worked via the UART. The PC sends commands to the protocol, which re-sponds by sending one or two bytes back to the PC (for more information, see 3.9).

At this stage, it had been noticed that there was a skew between the lines containing the data (see 3.8.3, skew registration). Because of this skew, the orig-inal idea of how the four bit words should be formed and transferred to the PC could not work. If, for instance, the data in the third line in Figure 1.1 arrives one sample faster than in the other lines the correct four bit word ’1011’ will instead be ’1001’.

Because the serial communication with the RS232 transfers the data in a very slow speed (11.52 kHz) a memory is needed so that no information gets lost. To solve the problem with the skew mentioned above, FIFO (First In First Out) mem-ories were created for each of the channels. Because the protocol inside the FPGA is clocked by a 50 MHz oscillator and the data from the receivers is coming at 240 or 300 MHz, two port FIFOs were needed. A two port FIFO has independent write and read clocks so that information can be written into the FIFO at one rate and read from it at another. This makes it possible to read out from the FIFO at only 50 MHz without loosing information. Such FIFOs were generated using the ISE (Integrated Software Environment) program (software tool provided by Xilinx). The data from each memory is transferred to the PC. Once within the PC the bits in each memory will be formed to the correct four bit words (see 4.2).

To start with a protocol that can transfer eight bit data over one link (received from one receiver of the dual tile) to the PC was created A similar protocol was then created for the transfer of ten bit data words. The main difference compared to the protocol for eight bit data is that this protocol has to send the ten bit word divided between two bytes to the PC. It was successfully shown that a ten bit word received from the dual tile remains ten bits when fed into the FPGA logic.

The task could finally be completed with four receivers combined with four two port FIFOs and with associated protocol. Once the task had been completed, it was possible to see that no disturbances existed between the links. Furthermore, the skew that had been noticed, could now be measured and was found to be up to three samples.

A test was also made to eliminate the skew between the channels using channel bonding. This was done with a counter producing the data. The test was

(17)

suc-1.5 Abbreviations 5

cessful for its purpose and channel bonding could later on be useful if the skew depends on start up problems. It cannot be used with the present Σ∆-chip that will arrive since this cannot generate suitable test signals for measuring the skew and make channel bonding work. The idea is that this could be possible with a new Σ∆-chip that includes a counter which generates data to the four channels and with channel bonding the four receivers are synchronized. After this the Σ∆-converter is connected to the receivers instead of the counter.

1.5 Abbreviations

ADC Analogue to Digital Converter

ASIC Application Specific Integrated Circuit

CDR Clock Data Recovery

CML Current Mode Logic

CMOS Complementary Metal Oxide Semiconductor

DAC Digital to Analogue Converter

DIP Dual In-line Package

FIFO First In First Out

GTP Giga Transceiver Peripheral

HDL Hardware Description Language

ISE Integrated Software Environment

LED Light Emitting Diode

LVCMOS Low Voltage Complementary Metal Oxide Semiconductor

LVDS Low Voltage Differential Signal

OSR Over Sampling Ratio

PC Personal Computer

(18)

PLL Phase Locked Loop

PMA Physical Medium Attachment

PRBS Pseudo Random Bit Sequences

RS232 Recommended Standard 232

SMA Sub Miniature version A

UART Universal Asynchronous Receiver/Transmitter

UCF User Constraint File

USB Universal Serial Bus

VHDL VHSIC (Very High Speed Integrated Circuit) Hardware Description Language

(19)

Chapter 2

Background

This chapter is structured as follows. Principles of aP converter and the P ∆-chip are described in section 2.1. The Virtex-5 FPGA and the RocketIO GTP used in the project are described in Sections 2.2 and 2.3. The associated development board is described in Section 2.4 followed by a short description of the ISE software package in Section 2.5. Sources of information are the references [1-9].

2.1

P

_{∆-converter}

2.1.1 Basic principles of a

P

∆-converter

+

-+

Analog input Digital output

DAC ADC Power Power fs M fs/2 fs/2 A B Power _C M fs/2 fs/2

∫

fs/2

Figure 2.1. Block diagram of aP

∆-ADC (Analogue-to-Digital Converter). Inserted diagrams A, B, C show the power of the signal (represented by a single frequency) and noise distribution using A: Nyquist sampling rate, fs, B: oversampling with sampling rate Mfs and C: oversampling with feedback loop.

(20)

The Nyquist theorem states that when analogue signals are digitalized, the sampling frequency (Nyquist frequency, fs) must be at least twice that of the highest frequency (f s₂) in the analogue signal, if there should be no loss in the information of the signal. The P ∆-converter uses oversampling which means that it samples the analogue signal using a much higher frequency (Mfs) than the Nyqvist theorem requires (see figure 2.1).

The result of this oversampling is that the quantization noise gets distributed outside the signal frequency band so that the noise level in this band is reduced (see diagrams A and B in figure 2.1). The feed back loop with the DAC (Digital to Analogue) converter acts as a low pass filter for the signal and a high pass filter for the noise. In this way, the noise is pushed towards higher frequencies and the noise in the signal frequency band further reduced (see diagram C in figure 2.1). By feeding the digital out put through a low pass filter, the SNR (Signal to Noise Ratio) is drastically improved compared to case A (figure 2.1). Finally, the low pass filtered signal is down sampled to get back to the Nyquist frequency (the process of low pass filtering and down sampling is called decimation).

2.1.2 The

P

∆ - chip

@fBB

Test Chip

Merged Wideband

I

Q LO+ LO-LPF LNA +Mixer LPF 4Bit ΣΔ ADC 1 n

(Φ -Φ ) CLK

0.5-6GHz @2.4GHZ CLK Gm Gm PRBS XOR 2.4Gbs 2.4Gbs Clocking Circuit M M 4Bit ΣΔ ADC _XOR 50Ω

(0 -10MHz)

IF

f

_FPGA @fBB

Test Chip

Merged Wideband

I

Q LO+ LO-LPF LNA +Mixer LPF 4Bit ΣΔ ADC 1 n

(Φ -Φ ) CLK

0.5-6GHz @2.4GHZ CLK Gm Gm PRBS XOR 2.4Gbs 2.4Gbs Clocking Circuit M M 4Bit ΣΔ ADC _XOR 50Ω

(0 -10MHz)

IF

f

_FPGA

Figure 2.2: Block diagram of the Σ∆-chip with two parallel 4 bit Σ∆-converters.

The chip contains XOR gates which offer the possibility to choose if the CLK signals, data from the Σ∆-converters or the PRBS (Pseudo Random Number Se-quence) should be sent. The data are captured by an FPGA, low pass filtered and downsampled [4].

The aim with the P ∆-chip in figure 2.2, which underlies the task in this thesis, is to receive signals at a bandwidth of 20 MHz. By utilizing a sampling frequency of 2.4 GHz, an oversampling of OSR = 2400₂ 10 = 120 (Over Sampling Ratio) is achieved. This means that a quantizer accuracy of 4 bits is "converted"

(21)

2.2 Xilinx Virtex-5 LX110T FPGA 9

to an accuracy of 13.5 bits (4 + 0.2 ∗2_log(3∗OSR3

Π2 ). A problem is that the

data-rate from the quantizer is very high, in this case 2.4 Gwords/s with 4 bit words. Therefore, very fast digital logic is needed to capture this signal. The objective of this thesis is to utilize and configure an FPGA with high speed ports to capture this data and transfer them to a PC for analysis. The analysis aims for testing the Σ∆-receiver system and to investigate possible algorithms to correct eventual errors in, for example, the DAC (Digital-to-Analogue Converter) in the system [1].

2.2 Xilinx Virtex-5 LX110T FPGA

General description of FPGA

An FPGA is a semiconductor device containing ’logic blocks’ with interconnects which can be programmed by the customer to create the logical functions needed for the purpose. This is why it is called ’field programmable’. FPGAs are usually slower and draw more power than ASICs (Application Specific Integrated Circuit) which are designed for a particular application. The advantage is that FPGAs are more flexible since they can be re-programmed to suit different designs. In mod-ern developments, the logic blocks and interconnects of traditional FPGAs are combined with embedded systems such as memories, microprocessors and related peripherals.

Xilinx Virtex-5 LXT FPGA

The FPGA in the Xilinx Virtex-5 family is specially design for high speed applica-tions and is therefore used in this work. Members of this family include individual features to suit different applications. Thus the Virtex-5 LTX contains a Rocke-tIO GTP transceiver tightly integrated with the FPGA logic. The version LX110T used in this work contains 16 GTPs (8 dual tiles).

The Virtex-5 family was introduced in 2006. Compared to its predecessor the Virtex-4, it is faster. One reason for this is that advances in CMOS (Comple-mentary Metal Oxide Semiconductor) technology now allows the gate length to be decreased from 90 nm to 65 nm resulting in shorter switching times. A draw-back with the thinner oxide layer is that it is associated with increased leakage currents. Therefore the 65 nm technology is only used in those components of the logic that are critical to speed performance and the 90 nm technology retained for those which are not [2].

(22)

2.3 Virtex-5 FPGA RocketIO GTP transceiver

The GTP transceivers are organized as dual tiles. In the dual tile configuration, two transceivers share important functions. Each of the transceiver contains a transmitter (TX) and a receiver (RX). Among the shared functions are the gener-ation of a high-speed serial clock and resets. The dual-tile configurgener-ation allows the TX and RX of both transceivers to share a PLL, see Figure 2.3. The PLL reduces the jitter and the shared configuration reduces the size and power consumption of the device. Shared PMA PLL IBDUFDS CLKIN MGTREFCLKP MGTREFCLKN TX RX TX RX

GTP0

GTP1

Figure 2.3: Organization of the dual-tile with the two transceivers GTP0 and

GTP1, each containing a transmitter (TX) and a receiver (RX). From the SMA (Sub Miniature version A) contacts (see 2.4), a differential clock signal passes through the IBDUFDS buffer to form the common clock signal, CLKIN, used as reference clock for the PLL.

Correct clocking and reset behavior are critical for any GTP transceiver design. Use of a high-quality crystal oscillator as reference clock is therefore essential for good performance. The reference clock feeding one dual-tile can be used to drive neighboring dual tiles. However, to keep the jitter within acceptable margins, no more than three dual tiles above and three dual tiles below the sourcing tile must be used so that the number of dual tiles that can be sourced by a common reference clock must not exceed seven (see Figure 2.4). An external clock is recommended as reference clock by Xilinx since using a clock from inside the FPGA may, depending on design, cause increased jitter.

(23)

2.3 Virtex-5 FPGA RocketIO GTP transceiver 11

Figure 2.4: A reference clock (CLKIN) can feed up to seven dual tiles without

creating unacceptable jitter [6].

The GTP transceiver consists of the PCS and PMA blocks in the transmitter (TX) and the receiver (RX) parts of the dual tiles. The TX and RX internal data paths are 8 or 10 bits wide. The components of these blocks are explained below.

2.3.1 GTP Transmitter (TX)

The GTP transmitter includes several blocks with opportunities to choose different paths for the data. In this section a description of the blocks of interest to this project is given.

(24)

Figure 2.5: Block diagram of the transmitter [6] TX Driver

The TX Driver is a high speed output buffer that transforms single ended signals to differential signals. It also includes ’Differential control’ and ’Configurable ter-mination impedance’ to achieve the highest quality of the signal in every situation. The ’Configurable terminations impedance’ is not used in this project. The ’Dif-ferential control’ sets the amplitude of the dif’Dif-ferential swing that the signal needs to reach the receiver.

Pre-emphasis

Pre-emphasis control is used to improve the signal. In transmitting the signals be-tween the transmitter and the receiver, the high frequencies are attenuated more than the lower frequencies. To compensate for this, pre-emphasis can be used which decreases the amplitude of the low frequency signals. The P ∆-chip is equipped with this option so it will also be used in this configuration. If the chip did not have this opportunity, the problem could be solved using the RX EQ block (see 2.3.2).

PMA PLL Driver

This driver provides the transmitter with a high quality, low jitter clock signal. For more details about the PLL, see 2.3.3.

PISO

The PISO (Parallel In Serial Out) block transforms the signal from parallel inside the transmitter to serial when it is sent. This is done because the signal is less affected by external disturbances in serial than in parallel mode.

(25)

Polarity

The TX polarity control can be used when there is trouble (created by the hard-ware) to send the signal. It then swaps its polarity.

Phase Adjust FIFO And Oversampling

Between the PMACLK and the TXUSRCLK clock domains (see figure 2.8) there has to be some circuit to resolve phase differences. This is solved by Xilinx in two ways, either by using the ’TX-buffer’ or the ’Phase Alignment circuit’. The TX-buffer is easy to use and is required when using oversampling but it does not give any benefit in reducing skew between GTP transceivers. If low latency is critical, the TX-buffer must be bypassed. The phase alignment circuit requires extra logic and more demanding clock requirements, for example TXOUTCLK (see figure 2.8, XCLK) cannot be used. If more than one dual tile is in use and they have the same line rate, the phase alignment circuit can reduce skew between them. Oversampling is not used in this project.

PRBS Generate

PRBS stands for ’Pseudo Random Bit Sequence’, which means that a sequence of bits is created, which looks random but in fact is repeated with a given periodicity generated by an algorithm. This makes it possible to send the sequence of bits and receive it at another location knowing what data will be received. The PRBS sequence is often used in the industry to control the condition of data links. There are three standard patterns available. In this project, the 223− 1 standard is used.

In the transmitter there is one PRBS producer (PRBS Generate).

FPGA logic

This block represents the Virtex-5 FPGA, where the logic is written.

2.3.2 GTP Receiver (RX)

The blocks contained in the GTP receiver and used in this work are described below and illustrated in figure 2.6.

(26)

RX EQ

The equalizer uses a separate receive buffer to capture the high frequencies, am-plify them and add them to the original signal. This is used for the same purpose as the Pre-emphasis is used in the transmitter to compensate for that high fre-quencies are more attenuated than lower frefre-quencies during transmission. The EQ block also contains the CML (Current Mode Logic) of the receiver with pos-sibility to adjust the termination impedance so that it matches the impedance of the transmission line with the incoming signal to avoid reflections in the system. The incoming differential transmission lines are both internally terminated with adjustable resistors (50/75 ohm, 100/150 ohm differential) and there is possibility to choose DC or AC coupling. DC coupling is preferable when the differential signals and common mode voltages of the connected devices match each other. If this is not so, AC coupling is usually used and normally achieved by putting a capacitor in the signal path. To make DC coupling work, the LVDS (Low Voltage Differential Signal) driver must see a 100 ohm termination. Internal AC coupling is an option and used when DC coupling is chosen but the RX termination is set to GND (Ground) (non standard termination voltage).

RX CDR

When the data is received, its embedded clock signal has to be recovered. The CDR (’Clock Data Recovery’) is doing this by taking the divided signal from the PLL, P LL_T XDIV SEL_OU T (see figure 2.7) and adjust it to the phase and frequency of the incoming clock. It cannot be more than 1000 ppm (parts per million) difference between the line rates of the recovered clock and that of the re-ceiver if this should succeed. There must also be a sufficient number of transitions in the incoming data stream. Clock recovery contributes to reducing jitter in the incoming data.

PMA PLL Driver

This driver provides the transmitter with a high quality, low jitter clock signal. For more details about the PLL, see 2.3.3.

SIPO

The SIPO (Serial In Parallel Out) block transforms the received serial data to parallel data inside the transmitter.

PRBS Check

When the PRBS pattern is used to control a link, the receiver side must have a checker (the PRBS Check), controlling that the data sent from the transmitter is correctly received. The chip including the Σ∆-converter also includes a PRBS generator which sends this pattern.

Polarity

If Polarity is used on the transmitter side, the data has to be swapped once more in the receiver to restore the correct form.

(27)

Comma detection

In the transceiver at the PCS section, see figure 2.5 and figure 2.6, data is paral-lel. In the transmitter, the parallel words go through the PISO (Parallel In Serial Out) block and are sent in serial mode. This is because serial information is less affected by disturbances outside the FPGA. When the data arrives at the receiver it is serial but is transformed back to parallel inside the receiver (because this is faster). This is done in the PMA section, see figure 2.6. The data is parallel but the receiver has no possibility to know where the words start or end. To make this possible, the transmitter can send a predefined pattern at a regular interval. This pattern is called a ’comma’ and the logic block searching for this comma is called comma detection block, see figure 2.6. When the comma is found by the comma detection block, the receiver knows that the word of the chosen length starts.

RX Elastic Buffer

On the receiver side, the Phase adjustment circuit or the RX elastic buffer are used to resolve phase differences between the PMACLK and the RXUSRCLK domains, see figure 2.8. The RX elastic buffer can also correct frequency differences using clock correction. However, clock correction does not work with the unsorted data as from the Σ∆-converter. When clock correction is not used, the transmitter and receiver should preferably use the same frequency source to avoid problems with the RX elastic buffer. Frequency differences can alternatively be resolved using the phase alignment circuit and the recovered clock (RXRECCLK) to source the RXUSRCLK. The RX Elastic Buffer can be used with 8 or 10 bit words whereas the phase alignment circuit requires 10 bit words. The RX elastic buffer offers the option to use channel bonding to synchronize data between different lines.

FPGA logic

This block represents the Virtex-5 FPGA, where the logic is written and the in-formation from the received data will be stored/analyzed.

2.3.3 Shared PMA PLL

(28)

Each dual tile shares a PMA PLL that is driven from a high quality clock, CLKIN as seen in figure 2.3. It produces parallel and serial clocks for both trans-mitters and receivers in the dual tile. The parallel clocks are used in the PCS section of the transmitter. The frequency of the PLL clock is calculated as in equation 2.1.

P LLclock = _{P LL_DIV SEL_REF}P LL_DIV SEL_F B ∗ CLKIN (2.1)

For the receiver, the clock rate from the PLL is divided with

P LL_RXDIV SEL_OU T _0 for dual tile number zero (GTP0) and with P LL_RXDIV SEL_OU T _1 for dual tile one (GTP1). The divider can take the

values 1, 2 or 4. The parallel clock rates are obtained by dividing with W. The value of W is 4 when eight bit internal data width is used and 5 using ten bit data width. This gives the following equation.

RX_P arallel_Clock = _{P LL_RXDIV SEL_OU T _0/1∗W}P LLclock (2.2)

The RX serial clock rate is obtained by multiplying with two instead of divid-ing

P LL_RXDIV SEL_OU T with W. The data is trigged on both edges of the serial

clock and this explains the multiplication with two.

RX_Serial_Clock = _{P LL_RXDIV SEL_OU T _0/1}P LLclock ∗ 2 (2.3)

The transmitter section uses the same PLL clock as the RX section (equation 2.1). PLL clock rate is in both transmitters divided by

P LL_T XDIV SEL_COM M _OU T which can take the values 1,2 and 4. Then,

for dual tile number zero this clock rate is divided by P LL_T XDIV SEL_OU T _0 and for the dual tile number one by P LL_T XDIV SEL_OU T _1, in both cases with the optional values of 1, 2, 4. The parallel clock rate is then obtained by dividing with W, which gives the equation (2.4)

T X_P arallel_Clock =

P LLclock

P LL_T XDIV SEL_COM M _OU T _0/1∗P LL_T XDIV SEL_OU T _0/1∗W (2.4)

The serial clock rate is obtained from equation 2.4 by exchanging the division by W with multiplication with two, which gives equation (2.5) for the serial clock rate.

T X_Serial_Clock =

P LLclock∗2

(29)

2.3.4 Clock domains

The transceiver is divided in separate clock domains for both the transmitter and receiver (see figure 2.8). There are four clock domains, which are described below.

Figure 2.8: The clock domains [6]

1. ’Serial clock’ where a serial clock (TX/RX) generated from the PLL is run-ning

2. ’PMA parallel clock’ where a parallel clock generated from the PLL is run-ning. This clock is called XCLK in figure 2.8.

3. ’PCS parallel clock’ where user clock RXUSRCLK / TXUSRCLK is running generated from inside the FPGA and sourced by the XCLK.

4. ’FPGA parallel clock’ where user clock two, RXUSRCLK2 / TXUSRCLK2, is running generated from inside the FPGA and sourced by the XCLK.

(30)

Note

In the RX section, the frequency of the XCLK in the PMA parallel section must be sufficiently close to the RXUSRCLK rate in the PCS parallel section. All phase differences between the two clock domains must be resolved, on both the TX and RX sides.

2.4 Development board

There are many ways to configure the transceiver and Xilinx provides a develop-ment board (in the following also called test board), Virtex-5 ML523 [7] to help the designer to explore suitable configurations.

The test board used in this work is seen in figure 2.9. The components of the platform are marked with numbers 1-23. Those which have been used in this work are described below.

Figure 2.9: The development board [7]

• 1)The power switch (on/off) to the FPGA board

• 3) The J-tag port, which, when connected to the ’USB Cable Pod’ makes it

possible to download the code into the FPGA so that the specified circuit is created.

(31)

2.4 Development board 19

• 4,6) Clock pair with differential clock signals produced by the Superclock

module, which is the source for the reference clock, CLKIN, at the PLL (see figure 2.7).

• 12) Socket for the 50MHz oscillator that feeds the logic inside the FPGA • 15) 16 LEDs (Light Emitting Diode) available to the user

• 16) 16 switches available to the user • 17) four buttons available to the user

• 19) Reference clock, CLKIN, for the PLL (see figure 2.7)

• 20) Every marked rectangle contains the differential SMA (Sub Miniature

version A) contacts of one tile.

• 21) The contact for the RS-232 connecion, for more explanation see 2.4.2 • 22) The Superclock module, for more explanation see 2.4.1

2.4.1 Superclock module

The reference clock (used in this work) for the dual tiles is produced by the ’Su-perClock module’ on the board. It generates a low noise clock from 49 MHz to 640 MHz. To configure this clock, which represents the CLKIN for the dual tiles, two possible clocks are available. These two oscillators are represented by XTAL0 and XTAL1 where XTAL0 represents an oscillator at 19 MHz and XTAL1 one at 25 MHz. The frequency of the oscillator used is multiplied with the feedback source ’M’, with the alternative values 18,22,24,25,32,40, and divided with the divider se-lection ’N’, with the possible values 1,2,3,4,5,8,10. The frequency of the reference clock is obtained from the formula below.

CLKIN = XT AL ∗M_N (2.6)

Figure 2.10: The combination of SEL0 and SEL1 selects which oscillator that

(32)

Figure 2.11: The combination of M0, M1 and M2 determines the value of the

multiplier [7].

Figure 2.12: The combination of N0, N1 and N2 determines the value of the

divider [7].

After the tables shown in figures 2.10-2.12 have been studied, the configuration of the red DIP (Dual In-line Package) switch in figure 2.9 can be made to obtain the appropriate CLKIN frequency needed by the PLL.

When the CLKIN signal is connected it can be found at the 100 Ohm SMA differential clock pairs clk0, clk1, clk2. Depending on the number of dual tiles engaged, one, two or three of these are used to feed the SMA reference clock inputs. Each of these eight reference clock inputs can feed up to seven dual tiles, see section 2.3.

2.4.2 Serial interface (RS232)

The RS232 (Recommended Standard 232) protocol [3] is a standard that is com-monly used in transmissions between two units at a speed of up to 38.4 kbps data for cables of length up to 30 meter. The standard describes in what way the data will be sent. There is always a start bit followed by seven or eight data bits. Then there is one (optional) parities bit and one or two stop bits. The start bit is always zero and the stop bit is always one. If a parities bit is used it comes after seven to eight data bits (in this project eight bits are used). This bit is one if the sum of ones in the data package including the parities is even (even parity). If the

(33)

2.5 ISE 21

value of the parities bit is not as expected when it arrives at the receiver, there is some distortion in the transmission. There has to be a UART on both sides of the connections so that the data can be serialized and unserialized in a proper way.

2.5 ISE

Xilinx offers a software package (in this work the 10.1 version is used) ISE, which provides the user with all tools needed to implement the desired logic design into the FPGA. It contains built-in tools such as memory generators and wizards such as the ’New Project Wizard’ and the ’RocketIO GTP Transceiver Wizard’ to help the user manage the configuration of the transceivers. It supports the HDL (Hardware Description Language) Verilog and VHDL.

(34)

(35)

Chapter 3

Implementations

Xilinx provides a set of wizards, with forms to be filled in, to help the user create a core for a given application. The wizards used are described in 3.1 (The New Project Wizard) and 3.2 (The RocketIO Transceiver Wizard). The loopbacks, their testing and the data (PRBS and own produced data) which is used in the tests are described in 3.3-3.6. The structure of each project is described in 3.7, tests of sending and receiving data over up to 4 links and skew registration in 3.8. In 3.9, a protocol for the communication between the FPGA and the PC is developed. A short description of the MatLab program used for the serial com-munication between the FPGA and the PC is given in 3.10 and the use of channel bonding to synchronize the signals through the channels in 3.11. The chapter is closed with 3.12 testing the use of two instead of eight dual tiles.

For the communication between the PC and the FPGA MatLab is used, illus-trated below in figure 3.1.

PC MatLab

FPGA Serial communication

Figure 3.1. MatLab is used for communication between PC and FPGA

3.1 The New project Wizard

To start configure a transceiver, a project has to be created in the ISE. This project must have at least one source file (here a Verilog module) where the transceiver

(36)

should be instantiated in the top module. This is done by choosing a ’New Project’ under files in the menu. Then the ’New Project Wizard’ is opened. A project name, a location (where the project will be stored) and the source type of the top level must be chosen in the form. Here, the HDL is used, see figure 3.19 .

The next important step is to fill in the family of the FPGA, which device in the family it is, the package it comes in and the speed it shall count with. The speed is chosen between -1 and -3, where -1 represents the smallest gate delays for the FPGA. It is important that the delay in the actual FPGA can handle the speed (i.e., that the delay is not too long). Here the -1 was chosen. Modelsim was chosen as simulator (but simulation was never used). Verilog was chosen as the language, see figure 3.20. After this, two pages follow. The first with an op-tion to create a source, the second to add an already existing source. (Both of these alternatives can be implemented after that the project is created, then under ’Source, New Source/ Add Copy of Source’.) Here is described how a new source is created from the ’New Project Wizard’. After ’New Source’ has been marked, a new wizard, ’New Source Wizard’, appears. This wizard starts with asking which type of source should be created, here a Verilog module named ’Top’, see figure 3.21 . After this selection has been made, the number of inputs and outputs and how wide these buses are, can be filled in. This is preferably done later when the transceiver has been instantiated in the top module (writing the inputs and outputs with their sizes at the head of the code describing the module). Before this stage it is impossible to know all the inputs and outputs, which are needed and their sizes. After the new source has been created, there is a question about where this should be implemented. It can be in the Implementation, Simulation, None or All. Here ’All’ was chosen.

If a module source will be added to the project, it is important that the al-ternative ’Add Copy of Source’ is used instead of ’Add a Source’. The program is made such that if ’Add Copy of source’ is used, changes in the code of this file will not influence the original file. If the ’Add a Source’ alternative is chosen, the original source file will be used and changes of this one will influence other projects using the same file. If it is changed to a different name, the other projects will no longer have the possibility to use the file.

Now when the project has been created, the core generator for the transceiver has to be implemented. Under ’Source’ in the menu, the heading ’New Source’ is chosen and the ’IP CORE Generator and Architecture Wizard’ in the ’New source Wizard’, see figure 3.21. Under the map ’FPGA Features and Design’, ’IO Interfaces’ and finally ’RocketIO GTP Wizard v1.8’ are selected. The formula for The ’RocketIO GTP Transceiver Wizard’ opens up and selections made as described in the next section.

(37)

3.2 The RocketIO GTP Transceiver Wizard 25

3.2 The RocketIO GTP Transceiver Wizard

In this wizard the user can determine the design of the transceiver, the number of dual tiles that will be created, what clock domain will be used to source the USERCLKs etc. Here, it will be explained how some of the settings which are shared in all the projects are made.

In the wizard the user chooses how many GTP dual tiles that will be created and which differential clocks that will be used as reference clock/clocks (under ’RE-FCLK source’) for these tiles. There is a possibility to use up to seven GTP dual tiles with the same reference clock and this possibility was chosen in all projects using less than eight dual tiles.

It has to be decided which ’Internal Data Width’ that will be sent/received and the transmission line rate has to be set. ’Target Line Rate’ was set to 2.4 Gbps. The speed of the ’Reference Clock’ was set to the most recommended choice for the actual internal data width. For some configurations, a predefined protocol can be chosen but, with the Σ∆-converter as a source, this option was not possible to use. An own standard had to be created and, as a consequence, all projects initiated in the wizard use the ’Start from scratch’ option instead of a well known predefined protocol option in the ’Protocol Template’ area. The ’Silicon Version’ was set to ’PRODUCTIONS’. This means that the board has been well tested by the manufacturer.

Both transmitter and receiver are instantiated. Thus, under ’TX settings’ and ’RX settings’ the line rate is set to 2.4 Gbps and the data path to ten or eight bit. Then it is possible to choose ’Encoding’ but since such possibilities are not supported by the Σ∆-converter, this is not used. The alternative ’None’ is there-fore chosen here. For the GTP1 the ’Protocol Template’ was set to ’Use GTP0 Settings’. In the line rate section, for each dual tile there is a possibility to turn off the RX, TX side or both. In most of this work the configuration was set as above but if only the transmitter or the receiver is to be used the other will be set to ’No TX’ or ’No RX’. In most tests performed after the loopbacks had been checked, only one dual tile in the transceiver sends or receives data (GTP0). In this case, it can be favorable to disconnect the second dual tile (GTP1). The ’Protocol Template’ is set to ’Start from scratch’ and the ’Line Rate’ to ’No TX’ and ’No RX’, see figure 3.22.

The receiver user clock (RXUSRCLK) was synchronized with the transmitter user clock (TXUSRCLK) by using TXOUTCLK as a source for both the trans-mitter and receiver user clocks, see 1.3. In the wizard this is done by setting ’TXUSRCLK Source’ and ’RXUSRCLK Source’ to ’TXOUTCLK’.

It also has to be decided whether to use the TX/RX elastic buffer or the phase alignment circuit to minimize skew between the ’PMA Parallel Clock’ and ’PCS parallel Clock’ domains, see figure 2.8. It was not clear from the beginning if

(38)

the receiver could deliver 10 bits to the FPGA logic and since the phase align-ment circuit requires 10 bit data width, it was decided to use the TX/RX elastic buffer. As described in the background, the transmitter and receiver should have the same oscillator source (the same reference clock) to assure that frequency dif-ferences will not appear between XCLK (RXRECCLK) and RXUSRCLK, when ’clock correction’ cannot be used in the RX elastic buffer as is the case in receiving unsorted data. There is no possibility to use the 1.2 GHz clock in the serial part of the transmitter to drive the clock in theP ∆-converter (see figure 2.2) and in this way avoid frequency differences between XCLK and RXUSRCLK. However, this problem could possibly also be solved by other means, e.g., using two synchro-nized signal generators to produce one clock for theP ∆-converter at 1.2 GHz and one producing the 240 MHz clock to the dual tile to replace the reference clock provided by the super clock module (see 2.4.1) [4]. Since the P ∆-chip was not available during the time of this thesis work, no tests of such a solution could be carried through. It was then decided to configure the receiver presupposing that it would be possible, in one way or another, to avoid the frequency differences between XCLK and RXUSRCLK. The possibilities to use phase alignment (i.e., to use the RXRECCLK to drive the RXUSRCLK) should be investigated as an option if the suggested configuration should not work. As a consequence of the above discussion, TX/RX elastic buffer was used and selected by choosing ’Enable TX Buffer (default)’ in the wizard.

In order to have a way to reset either the receiver part of the transceiver or the transmitter, the RXRESET and TXRESET options were chosen in the wizard.

In every project, the ’Main driver differential swing’ and the ’Preemphasis level’ were chosen to be set manually so that they could easily be changed in the project. The ’Preemphasis boost’ was set, giving an increased swing of 10 percent. After this, there are different opportunities for different designs. These will be described in each specific case.

3.2.1 Generated files

When the ’RocketIO GTP Transceiver Wizard’ has created the core for the transceiver, some Verilog files are created, modules describing the tiles chosen and their settings and one describing the top of this tiles, the interface to the user. This interface has to be connected to specific applications in the code to get the core transform into a fully working transceiver.

In addition, an UCF (User Constraint file) - file is created where all settings are stated. This UCF file has to be included in the projects. It is divided into two parts, one called the attribute file and one called the example file. In the example file, there is some information (location of the dual tile and the clock instance) that has to be included in the attribute file to make it work. It is also useful to go through the UCF settings before using them in the design. For example, if more

(39)

3.2 The RocketIO GTP Transceiver Wizard 27

than one dual tile is created there has to be changes in the search path because the UCF file can only describe the search path for one of the dual tiles. The UCF file is not perfect. It is recommended to go through it and see that everything is as expected.

All signals that come to the FPGA or go out from it must be set as a NET in the UCF file. Both transmitter and receiver should be programmed to the same signaling standard for proper operation. Here the TXN, TXP and RXN, RXP pins were set to the ’LVCMOS12’ IO standard.

3.2.2 Clock Connections

To instantiate the transceiver/transceivers they have to be created in the ’Rock-etIO GTP Transceiver Wizard’ (3.2) according to desire. The reference clock is set in the Wizard, i.e., what frequency the reference clock of the dual tile must have to work properly. As described in the background (Chapter 2) there has to be physical settings made at the ’DIP switch’ on the board to get this frequency and cables must be drawn from the ’super clock module’ to the ports feeding the reference source (’REFCLK source’), see 2.4. Finally, three more operations must be done in the code to make the clocks function.

1)The reference clock delivers a differential signal (MGTREFCLKP and MGTRE-FCLKN) to the dual tile. The dual tile requires a common clock and this is obtained by implementing an IBUFDS as described in figure 3.2.

Figure 3.2. Connection of the differential reference clock [6]

2) As described in the ’RocketIO GTP Transceiver Wizard’ (3.2), the TX-OUTCLK drives the TXUSRCLK and RXUSRCLK. Since this has been set in the wizard, the TXOUTCLK is found in the interface and has to be connected to a BUFG buffer to create the TXUSRCLK and RXUSRCLK, see figure 3.3.

3) The TXUSRCLK has to be connected with the TXUSRCLK2. The same procedure is used to connect RXUSRCLK with RXUSRCLK2. Now the TXOUT-CLK feeds the user clocks as shown in figure 3.3.

(40)

Figure 3.3. An example of the connection of TXOUTCLK driving the TXUSRCLK

and TXUSRCLK2 with 8 or 10 bit data width. The RXUSRCLK and RXUSRCLK2 have to be connected in the same way [6]

In all projects there has to be set how strong the transmitting signal should be peak to peak (the differential swing). It can be chosen in eight ways between 0-1100 mV. In this work the TXDIFFCTRL was assign to ’100’ that represents a differential swing of 800 mV. This should work properly without too much power loss.

In transmitting the signals between the transmitter and the receiver, the high frequencies are attenuated more than the lower frequencies. To compensate for this, a pre-emphasis can be used which decreases the amplitude of the low fre-quency signals. The pre-emphasis port TXPREEMPHASIS was assign to ’100’ which means that the pre-emphasis is 18.5 percent of the chosen differential swing. This value can be chosen to be between 3-52 percent in eight steps [6]. A too high pre-emphasis, however, can cause distortion of the signals so the 18.5 percent cho-sen was regarded as a fair middle course.

3.3 Loopbacks

(41)

3.4 Near-End PCS Loopback 29

To understand how the transceiver was working, the first step was to do the loopbacks suggested by Xilinx. Xilinx has suggested four loopbacks: the Near end PCS , the Near end PMA , the Far end PMA and the Far end PCS loopbacks [6]. In this project, the first three were tested. This is because after that the third one has been tested, the Far end PMA loopback, the whole transceiver with the transmitter section and the receiver have been tested and further loopbacks are not needed to configure the transceiver.

To be able to do these loopbacks, the transmitter and the receiver were created as described in 3.2 above and a top module was created where the two instants are instantiated and the ports set in the way the designer has thought is best. When own produced data was sent, this module was instantiated here as well. Also it is important in all the projects to connect RESETDONE which is a signal set after the bit file is burned into the FPGA. This signal indicates that all resets that are necessary to do for the FPGA to be able to work have been done. The PLL signal indicates that the PLL has been able to lock. If not so, it could be that the reference clock operates at another rate than prescribed or that something else is not right in the transceiver. These two signals are preferably connected to two different LEDs (Light Emitting Diode) so that the user can see that both are lightened. If one of them has not been able to lighten, it is not likely that the transceiver works as desired.

3.4 Near-End PCS Loopback

Figure 3.5. Near End PCS loopback [6]

The first loopback tested was the Near-End PCS Loopback. This loopback tests the digital part of the transmitter and never involves the parallel-to-serial and the serial-to-parallel sections in the PMA block [6]. This loopback was tested both with PRBS data and with own produced data. In addition to the settings in the ’RocketIO GTP Transceiver Wizard’ (3.2), PRBS settings were implemented. The Near-End PCS Loopback is shown in figure 3.5.

(42)

3.4.1 PRBS

Initially one transceiver was created that could send and receive PRBS patterns. The internal data path then has to be set to 10 bit and the PRBS transmission control as well as the PRBS detector are chosen in the ’RocketIO GTP Transceiver Wizard’. In the wizard, there is also the possibility to chose the threshold (the value that has to be reached before an error will occur) for the PRBS error. In this project, the value of 255 should be reached before the PRBS error is indi-cated. This number is set to avoid that the PRBS error should be influenced by instantaneous startup problems.

When the settings in the wizard have been done and the PRBS generator is used, it has to be assigned to the TXENPRBSTST port that it is enable for the actual PRBS pattern. It is also important to enable the pattern checker, RX-ENPRBSSTST, in the same standard. When this was done the TXDATA was assigned to send zeros so that the PRBS information would not be disturbed.

When a PRBS error occurs the RXPRBSERROR port will be trigged. This port is coupled to a LED and this will indicate when the RXPRBSERROR turns high. The RXPRBSERROR port has to be reset before the LED will turn off, even if no more errors are indicated. This can be done with the GTPRESET, that resets the whole dual tile transceiver or RXRESET that resets the whole re-ceiver part of the transre-ceiver or with the PRBSCNTRESET that resets the PRBS counter. Another port which does the same is RXCDRRESET but this one is not implemented in this loopback (or in the project at all).

The next step was to see that the own produced data was coming through the PCS loopback channel. In this case, the wizard was gone through as in 3.2 and an own communication module was built as described below.

(43)

3.4 Near-End PCS Loopback 31

3.4.2 Own produced data

00000111 00000011 communication TX RX TXUSRCLK2 dual tile0 module module RXDATA0=00000011 TXDATA0=00000111 Near-End PCS loopback RXUSRCLK2

Figure 3.6. Communication block

This module (illustrated in figure 3.6) includes a counter that was connected to the output on the communication module, and to the TXDATA port on the transmitter. The output data from the counter was clocked out to the transmitter with the same clock as this, TXUSRCLK2, so that they were synchronized. The input signal from the receiver (RXDATA) was clocked in with the same clock as the receiver, RXUSRCLK2.

Because this communication module was built in the FPGA and was not part of the transceiver, an own reset function had to be implemented. The module was also given an enable function for the counters. When enable mode was on, both counters started.

(44)

The idea was that if the counter at the output sends a one, a one should also appear on the receiver side. Since the data stream is produced from a counter, it is also easy to know what the next value should be. This makes it possible to have a copy of the counter producing data, to check the data. Because the counters start at the same time and the data has to travel from the transmitter to the receiver port ( RXDATA) the counter that checks the incoming data will most probably not have the same value as the incoming data at the receiver port. So when the first data appears at the receiver port, the counter is synchronized with the incoming data. After that, the incoming data (at the RXDATA port) will be the same as the counter checking it, if the link has no distortions. If the counter and incoming data were the same, a LED representing the correct data was switched on. Else another LED representing incorrect data was lightened at the test board (see 2.4).

The counter checking the incoming data will be synchronized every cycle of the producing counter (every time the incoming data will be a one). This means, for example, that if noise at the RXDATA port causes the counter on the receiver side to start synchronize before the first sent data arrives, it is no problem because it will be synchronized again when the proper number one arrives.

Test

To have a way to search for errors in the configuration, couplings etc, a switch was implemented in such a way that if it was high, the correct loopback was assigned and if the switch was off, normal mode was set in the loopback. When there was nothing connecting the TXN,TXP drivers with the RXN, RXP pins nothing could come back. It means that if own produced data was used and the LED signalled that everything was in order, it should say that an error occurred after the switch was pulled down. If the RXPRBSERROR (coupled to a LED) was not lightened when the switch was on, it should be lightened when the switch was pulled down.

3.5 Near-End PMA Loopback

(45)

3.5 Near-End PMA Loopback 33

The second loopback which was tested also includes the analogue part of the transceiver and here the parallel-to-serial and the serial-to parallel sections are in-volved. This did not make any difference when PRBS data were sent through the channels compared to what is described in 3.4. It did not even have to be another project to make this happen, just to change the loopback to be in Near-End PMA Loopback mode. But in the part in which own produced data was to be sent there has to be some changes which are described below. The loopback is shown in figure 3.7

3.5.1 Comma detect

When a project uses the comma detect function, the settings in the ’RocketIO GTP Transceiver Wizard’ are as in 3.2. Also some new settings are needed. Under ’RX Comma Alignment’, ’Use Comma Detection’ is chosen and the square ’Combine Plus/Minus Commas (double-length comma)’ is filled in because using a double comma is more safe. There are already comma suggestions in the ’Plus comma’ and ’Minus comma’ fields. These predefined commas are used in connection with 8/10 bit encoding, which is the most common way to send data. The data which is to be sent goes through an encoding block. All the 256 eight bit words are described in a ten bit code. Then it will be combinations left, not describing a word. These combinations are chosen to be commas. In this work, this feature was not used so these commas have to be changed to words represented by the counter. The positive comma and the negative comma also need to be the inverse of each other. This makes the choice easy. It has to be ’1111111111’ for the ’Plus Comma’ and ’0000000000’ for the ’Minus Comma’.

When the comma port is defined as a ten bit port it can cause problems when an eight bit counter is used. This problem is solved with the ’Comma Mask’ that works like a demand specification. If ’Comma Mask’ is set to ’1111111111’ it means that all the ten bits in the comma have to be matched for the comma to be accepted. A zero in this comma mask means that this bit is a ’do not care’ bit. So using eight bit words ’Comma Mask’ is set to ’0011111111’.

If a single comma is used it is more likely that noise in the channel can cause the comma to trig than if a double comma is used. It has to set in the wizard that both a positive and a negative comma will be used even when double comma is chosen. Otherwise, the comma detect will look for two positive commas or two negative commas in a row. The squares ’ENPCOMMAALIGN’ and ’ENMCOM-MAALIGN’ are marked.

The following ’Optional Ports’ were chosen: RXCOMMADET that goes high every time a comma is detected and then remains low until next comma is de-tected; RXBYTEREALIGN that indicates if one byte alignment in the serial data stream has changed due to comma detection; RXBYTEISALIGNED that indi-cates if the byte alignment is properly aligned after comma detection, see figure

(46)

3.23

3.6 Far-End PMA Loopback

Figure 3.8. Far End PMA loopback [6]

The third loopback test includes a second dual tile. It is important that the second part of the dual tile channel is not in use because it is not reliable in this mode. The first dual tile is set in loopback for normal operation and the second dual tile is set in far ended loopback. Now the entire dual tile will be tested in full application mode. The figure above shows the second loop of the second tile. The core was created as in 3.2 and 3.5.1 for own produced data and as in 3.2 and in 3.4.1 when PRBS patterns were used. In this stage of the loopbacks the old version of control as described in 3.4 under the heading ’test’ cannot be used any more because all loopbacks are in use. Instead the TXINHIBIT port is chosen in the ’RocketIO GTP Transceiver Wizard’. When this port, that is assigned to a switch, is turned high the transmitter will stop sending data and starts to send differential zeros. This port later helpful to use in many of the projects. The loopback is shown in figure 3.8.

3.7 Structure of each project

After all the loopbacks had been successfully carried through, the following projects were designed such that only the GTP0 was used in each dual tile. However, GTP1 was defined in ’RocketIO GTP Transceiver Wizard’ but disconnected to assure that cross talk would not appear. Later in the project, use of GTP1 together with GTP0 was tested, see 3.12

3.8 Sending and receiving data

After the loopbacks had been tested, it had to be tested if the communication was configured in the right way, first over one link. Then it was interesting to see if

(47)

3.8 Sending and receiving data 35

there would be any problem to implement this over two links, i.e., if there would be any interference between the links or if there would be some other unknown problems. The last step was to create a four links transmission to be able to test how the dual tile should receive this data. At first, these links had there own testing modules. Later on, protocols were implemented and new possibilities to test the data became available.

3.8.1 Sending and receiving data over one link

The core for the two dual tiles was created as in 3.2 and when PRBS data was sent/received as in 3.4.1. With own produced data, the core was created as in 3.2 and 3.5.1 and used the logic block described in 3.4.2.

3.8.2 Sending and receiving data over two links

The core for the four dual tiles was created in the same way as for two dual tiles in 3.8.1 with the difference that there were two more dual tiles and the logic block described in 3.4.2 was duplicated so that there were two counter modules transmitting and checking if the data was coming through the channels without disturbance.

3.8.3 Sending and receiving data over four links

Using eight dual tiles, number zero, four, five and six were used as receivers, num-ber one, two, three and seven as transmitters. The project was created as in 3.8.2 with one difference: The reference clock could not be the same for all eight dual tiles so the dual tile number zero was sourced by another GTP clock than the others but running at the same rate.

On the test board, 16 LEDs are available. If the important signals that indicate that the PLLs are locked and that each dual tile is able to do all the resets that are necessary to lighten the RESETDONE, all LEDs would be occupied. Instead, two modules were created. One that checked that all PLLs were locked and one that checked that all of the dual tiles had been able to do the RESETDONE. There were also some simplifications done with the GTPRESET and TXRESET, RXRESET. From all the dual tiles there is a possibility to use those resets. This was not changed but the resets were tied together so just one button was needed to get all transceivers, with the eight dual tiles, to reset if the GTPRESET was pushed down, and one button for each of TXRESET and RXRESET. The result was that these resets needed only three buttons instead of 24.

(48)

Skew registration

00000111 00000011 communication

TX

RX

tile6_TXUSRCLK2 module tile6_RXUSRCLK2

TX

RX

tile0 tile4 tile5 tile6

tile1 tile2 tile3 tile7

Figure 3.9. Block diagram of the set up for skew testing

At the start the communication module illustrated in figure 3.6 was used for each dual tile pair (TX and RX). This worked well and the next step was to see if the data was coming through the channels synchronously. In this case only one of these communication blocks was used. The output signal from this module was connected to all the four transmitters so that it was only one source feeding them. The receivers were clocked in with TILE6_RXUSRCLK2 but they have also been tested with the user clocks of all the other tiles. If the counter on the receiver side was the same as TILE6_RXDATA, TILE5_RXDATA, TILE4_RXDATA, TILE0_RXDATA at the same time, the data would be synchronously received. This was not the case, but occasionally up to three of them could be synchronous.

Because of this skew, the original idea of how to store and transfer the data to the PC (see 1.1) could not be used. Instead, the storing problem was solved as described in 3.9 (also see 4.1 for more details about how this could work with the Σ∆-chip).

Utilizing FPGAs for data acquisition at high data rates

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Utilizing FPGAs for data acquisition at high data

rates

Utilizing FPGAs for data acquisition at high data

rates

Examensarbete utfört i Electronics

vid Tekniska högskolan i Linköping

av

Abstract

Sammanfattning

Acknowledgments

Contents

Chapter 1

Introduction

1.1

The task

Virtex 5

PC

2.4Gbps

1.2

Structure of the GTP transceiver

1.3

Tests used to understand the functioning of

the transceiver

1.4

Communication between the FPGA and the

PC - development of a protocol

1.5

Abbreviations

Chapter 2

Background

2.1

∆-converter

2.1.1

Basic principles of a

∆-converter

+

∫

2.1.2

The

∆ - chip

Test Chip

I

(Φ -Φ ) CLK

(0 -10MHz)

f

Test Chip

I

(Φ -Φ ) CLK

(0 -10MHz)

f

2.2

Xilinx Virtex-5 LX110T FPGA

2.3

Virtex-5 FPGA RocketIO GTP transceiver

GTP0

GTP1

2.3.1

GTP Transmitter (TX)

2.3.2

GTP Receiver (RX)

2.3.3

Shared PMA PLL

2.3.4

Clock domains

2.4

Development board

2.4.1

Superclock module

2.4.2

Serial interface (RS232)

2.5

ISE

Chapter 3

Implementations

3.1

The New project Wizard

_{∆-converter}