Design and implementation of a high-speed PCI-Express bridge

(1)

LiU-ITN-TEK-A--19/012--SE

Design and implementation of a

high-speed PCI-Express bridge

Mandus Börjesson

Håkan Gerner

(2)

LiU-ITN-TEK-A--19/012--SE

Design and implementation of a

high-speed PCI-Express bridge

Examensarbete utfört i Elektroteknik

vid Tekniska högskolan vid

Linköpings universitet

Mandus Börjesson

Håkan Gerner

Handledare Qin-Zhong Ye

Examinator Adriana Serban

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

(4)

Abstract

This master thesis will cover the prestudy, hardware selection, design and implemen-tation of a PCI Express bridge in the M.2 form factor. The thesis subject was proposed by WISI Norden who wished to extend the functionality of their hardware using an M.2 module.

The bridge fits an M-Key M.2 slot and has the dimensions 80x22 mm. It is able to communicate at speeds up to 8 Gb/s over PCI Express and 200 Mbit/s on any of the 20 LVDS/CMOS pins. The prestudy determined that an FPGA should be used and a Xilinx Artix-7 device was chosen. A PCB was designed that hosts the FPGA as well as any power, debugging and other required systems.

Associated proof-of-concept software was designed to verify that the bridge operated as expected. The software proves that the bridge works but requires improvement before the bridge can be used to translate sophisticated protocols.

The bridge works, with minor hardware modifications, as expected. It fulfills all de-sign requirements set in the master thesis and the FPGA firmware uses a well-established protocol, making further development easier.

(5)

Acknowledgments

We would like to thank the supervisor and examiner who took part in this Master thesis. Your feedback has proven valuable throughout the entire thesis work, not only helping us overcome obstacles but also helping us improve the quality of the final report.

We are grateful of our fellow students who helped motivate us, gave valuable input and were always eager to discuss ways to overcome our engineering challenges.

We also wish to extend a special thank you to our mentor Jonas Åberg. You have shared your many years of experience in hardware design and helped us boost the quality of the project beyond our expectations.

(6)

List of Figures

2.1 High level sketch of the system. . . 6

2.2 System sketch of the solution using a CPU with native PCI Express support. . . 6

2.3 System sketch of the solution when using a CPU without native support for PCI Express. . . 7

2.4 System sketch of the solution when using a FPGA. . . 8

3.1 Total accumulated price for N number of units, assuming 100 units/year. . . 12

4.1 System layout on the M.2 module. . . 16

4.2 M.2 connector system connections. . . 17

4.3 FPGA system connections. . . 17

4.4 Programming and debugging system connections. . . 19

4.5 Power management system connections. . . 19

4.6 Power hierarchy option considered in the design. . . 20

4.7 M.2 connector system connections. . . 21

5.1 Illustration of design rules used during layout. . . 23

5.2 Open-drain logic level conversion. HV indicates High (3.3 V) logic level, LV indi-cates low (1.8 V) logic level. . . 24

5.3 Layout surrounding the M.2 connector on the top layer of the PCB. Notice the fencing and return path vias as well as the ground plane (yellow) on the layer below the PCI Express lanes. . . 25

5.4 Fanout of the SPI and JTAG nets (highlighted) under the Artix-7. . . 25

5.5 Power-on sequencing using RC networks. . . 27

5.6 I/O voltage sequencing and selectable I/O voltage solution. . . 28

5.7 Characteristic change of capacitance according to DC-voltage. . . 29

5.8 Artix-7 FPGA power pin mapping. . . 30

5.9 Programming and debugging interface with related components marked. . . 31

6.1 Load switch schematic . . . 34

6.2 Reference clock differential signal measured on the bridge. . . 36

6.3 PCI Express link lists the capabilities of the bridge. . . 36

7.1 Console output after a successful write to the EEPROM. . . 38

7.2 Enumeration of the SMBus of the bridge. Notice the SMBus-SPI bridge at ’0x28’ and the EEPROM at ’0x50’ and ’0x51’. . . 38

7.3 Ideal SPI data transfer. . . 39

7.4 SPI data transfer using the built-in CS pin feature. . . 40

7.5 Final SPI data transfer implementation. . . 40

7.6 Overview of the VHDL block design. . . 41

7.7 Platform application writing to register and validates its value. . . 43

(9)

8.2 Layout – Inner Layer 1 . . . 45

8.3 Layout – Inner Layer 2 . . . 45

8.4 Layout – Bottom . . . 45

8.5 Output signal at 100 MHz. . . 46

8.6 Output signal at 200 MHz. . . 46

8.7 the nPor fix. Right: Altium designer screenshot with the modifications, Left: Same modification on the PCB. . . 47

8.8 The CLKREQ fix. Right: Altium designer screenshot with the modification, Left: Same modification on the PCB. . . 47

8.9 Signals; TX ready, data request, data in, data out, clock and state. . . 48

(10)

List of Tables

1.1 PCI Express transfer characteristics. . . 2

3.1 Product series candidates. . . 10

3.2 Selected devices from the different product families. . . 11

3.3 Artix-7 - Recommended DC and AC characteristics. . . 14

3.4 Texas Instruments LM26480 internal regulators. . . 15

4.1 Current flow and power loss in hierarchy option A. . . 21

4.2 Current flow and power loss in hierarchy option B. . . 21

5.1 JLC7628 layer stackup. . . 23

5.2 Routing rules used during the design of the bridge. . . 24

5.3 Optimum and designed power-on sequence for the different power nets. . . 27

5.4 Regulator SET_IO state to voltage relation. . . 28

5.5 Calculated values for Peak Current through the inductor and voltage ripple on the output, using: Vin =3.3V, η=0.85, Iout,max =0.8A, Cout =10µF, L=2.2µH, f =2MHz. . . 29

5.6 Artix-7 Decoupling network for XC7A12T and XC7A35T . . . 31

(11)

(12)

List of Tables

Abbreviations

Abbreviation Meaning

AC Alternating Current

BGA Ball Grid Array

BOM Bill Of Materials

DC Direct Current

CAD Computer Aided Design

CLK Clock

CMOS Complementary Metal Oxide Semiconductor

CPU Central Processing Unit

DLL Data link layer

DSP Digital Signal Processor

EEPROM Electronically Erazable Programmable Read Only Memory

ESR Equivalent Series Resistance

FPGA Field Programmable Gate Array

GPIO General Purpose Input/Output

GT/s Gigatransfers per second

I2_C _{Inter-Integrated Circuit}

I/O Input/Output

IDE Integrated Development Environment

IP Intellectual Property

JTAG Joint Test Action Group

LDO Low Drop-Out regulator

LE Logic Elements

LUT Look-Up Table

LVDS Low Voltage Differential Signaling

MCU Micro Controller Unit

MISO Master In Slave Out

MMCM Mixed-Mode Clock Manager

MOSI Master Out Slave In

NC Normally Closed

NO Normally Open

NVS Non-Volatile Storage

PCB Printed Circuit Board

PCI Express Peripheral Component Interconnect Express

PHY PHYsical

PISO Parallel In - Serial Out

PLL Phase-Locked Loop

PMIC Power Management Integrated Circuit

RAM Random Access Memory

SerDes Serializer/Deserializer

SIPO Serial In - Parallel Out

SMBus System Management Bus

SMPS Switch-Mode Power Supply

SPI Serial Peripheral Interface

SPDT Single Pole Double Throw

SSI Synchronous Serial Interface

SSTL Stub-Series Terminated Logic

TL Transaction Layer

TLP Transaction Layer Package

TTL Transistor-Transistor Logic

UI Unit interval

USB Universal Serial Bus

(13)

1 Introduction

As modern technology develops, high-speed interfaces and expandable design are becoming more prominent. Currently, one of the most widespread standards for expansion modules in personal computers is the Peripheral Component Interconnect Express (PCI Express) interface. The interface is usually accessible through expansion slots on the device and is, for example, used to improve performance or allow communication over interfaces that are not natively supported. Examples of modules that typically fit in expansion slots are hard drives, graphics processors, and ethernet cards.

While a variety of boards which extend the functionality of master devices through the M.2 connector already exist, most of them are focused on hardware acceleration of algorithms with limited connectivity options. They, therefore, lack the necessary outputs needed to inter-face with other hardware. The main objective of this project is to add interinter-face functionality to PCI Express capable devices with adaptability and upgradability in mind.

The purpose of this Master Thesis’s work is to create an expansion board to extend the functionality of WISI Norden’s current hardware. WISI Norden provides services for distri-bution of TV transmissions. They are currently using hardware that features an M.2 connector with a PCI Express interface. Throughout the report, two words will be used frequently when referring to the different hardware:

• Bridge refers to the the final product of the master thesis. This includes the PCB and components mounted on it as well as any software written.

• Platform refers to the board or contact that the bridge is supposed to be connected to. The bridge should act as a translator between PCI Express and any other protocol defined by the user. In other words, PCI Express is converted to a protocol and transmitted. If any messages are received, they are converted to PCI Express and sent to the platform.

1.1 Aim

The purpose of the master thesis is to design and implement a bridge that enables commu-nication with hardware that does not have native PCI Express support. The bridge should convert PCI Express to Low Voltage Differential Signaling (LVDS) and/or CMOS-logic. LVDS and CMOS covers the most frequently used solutions for communicating between different

(14)

1.2. Background hardware; differential and single-ended. Furthermore, the (CMOS/LVDS) side should be easily re-configurable in order to support different types, amounts and combinations of in-terfaces.

1.2 Background

This section will describe the theoretical background which is most relevant for understand-ing the general concepts regardunderstand-ing the M.2 standard as well as the PCI Express protocol.

M.2 connector

M.2 is a specification for connectors associated to computer mounted expansions cards. It defines a standard for physical dimensions. It also defines different keying-notches in order to prevent cards from being used in incompatible hosts. The M.2 standard is available in different configurations which are capable of PCI Express Gen 3.0, USB 3.0 and Serial ATA 3.0.

The PCI Express protocol

PCI Express is an interface designed for expansion boards such as video-, sound- and ether-net cards. It consists of one or multiple data lanes, where each lane can transfer data in both directions. Multiple data lanes can be utilized to improve the data throughput of a PCI Ex-press link. However, more lanes correspond to a larger footprint and more complex design. Transfer rates of the standard are seen in Table 1.1.

Table 1.1: PCI Express transfer characteristics.

Standard version Transfer rate Full duplex bandwidth_x1 _x2 _x4

PCI Express 1.0 2.5 GT/s 250 MB/s 500 MB/s 1 GB/s

PCI Express 2.0 5 GT/s 500 MB/s 1 GB/s 4 GB/s

PCI Express 3.0 8 GT/s 1 GB/s 2 GB/s 4 GB/s

The PCI Express interface is communicating through a link between a root-complex (mas-ter) and an endpoint (slave) device. The platform that this project is developed against is a root-complex. Hence, the bridge has to function as an endpoint device.

Layers

PCI Express is a layered protocol which consists of a physical layer, data link layer, and a transaction layer.

Transaction Layer

PCI Express relies on request and completion transactions to transfer data between devices. This is the primary task of the Transaction Layer (TL). The layer transmits requests and com-pletion transactions from the logic core and turns them into outgoing PCI Express packets. It receives incoming transactions from the Data Link Layer (DLL) and sends them to the core.

(15)

1.3. Research questions Data Link Layer

The data link layer’s purpose is to detect and correct faults in the PCI Express data transmis-sion. It continually monitors each PCI Express link and checks the data integrity of Transaction

Layer Packets(TLP’s) exchanged by devices.

Physical Layer

The Physical Layer (PHY) refers to the physical properties of the PCI Express link. Upon link establishment, the highest number of mutual lanes is selected to allow for the maximum com-patible data rate. Each lane consists of a unidirectional transmitting and receiving differential pair, resulting in a total of 4 wires.

The physical layer should be able to determine the clock frequency from the data trans-mission. PCI Express Gen 1 and 2 utilizes an 8B/10B encoding/decoding scheme which converts every 8 bits into 10 bits. The additional bits are placed where they generate a suf-ficient amount of rising and falling edges to properly establish the clock signal. The encod-ing/decoding scheme adds 20 % to the package overhead. PCI Express Gen 3.0 and above implement a 128B/130B scheme with scrambling, which is far more efficient at « 1.5 % addi-tional overhead.

1.3 Research questions

The master thesis is based on a set of questions presented below.

1. What are the alternatives to achieving PCI Express to LVDS/CMOS conversion? 2. Which alternative is expected to give the best result considering:

• Signal propagation delay. • Data throughput.

• Component cost.

• Physical size of the design. • Complexity.

3. What signal propagation delay and throughput can be expected of the system?

1.4 Design requirements

This section will cover the requirements for the design. The solution that is chosen should fulfill all of the requirements.

PCI Express interface

The design should be able to interface with at least a PCI Express Gen1 x2 interface, meaning that there are two data lanes at 2.5 GT/s. Later generations are desirable but not necessary. The lane configuration (x2) must not change. The design should act as a PCI Express end-point.

LVDS/CMOS interface

The interface should be able to support both LVDS and CMOS interfaces. The number of interfaces should be configurable to suit the application. Furthermore, each interface should be able to transmit data at up to 150 MHz.

(16)

1.5. Delimitations

Physical size and characteristics

The circuit board should have the dimensions 22x80x0.8 mm, which follows the standard size of M.2 expansion cards. If the size criteria is not possible to fulfill, the circuit board should be able to fit in WISI Nordens hardware. The M.2 Connector should be M-key and the PCB should be 0.8 mm thick.

Power dissipation

The design will be housed in a small area with limited cooling options. The solutions should not require active cooling.

1.5 Delimitations

The delimitations for this thesis are:

• The firmware for the solution should be seen as a demo-code, meaning all features do not have to be implemented.

• The CMOS/LVDS side of the bridge does not have to convert PCI Express to a known protocol, only show the differential/single-ended signals and their correspondence to the user input.

(17)

2 Prestudy

The prestudy consists of an analysis of available options that support the previously de-scribed functionality. The study includes an investigation of pros and cons with the different solutions as well as a decision of which of the solutions is to be implemented.

The PCI Express standard and data flow between devices were studied in order to get an understanding of what solutions were possible. When enough knowledge about the standard had been gained, different alternatives for achieving the desired functionality were investi-gated. The alternatives are presented below and compared in order to determine which one is best suited.

Aspects that weigh into the decision include, but are not limited to: • The price of components

• Physical size of components • Complexity of the solution • Life cycle of key components • Power requirements

2.1 System sketch

Conversion between PCI Express and CMOS/LVDS can be achieved through different tech-niques. The focus of the prestudy is to determine the most suitable method of acquiring the desired result while considering cost, complexity, physical dimensions and longevity.

The data will be transferred to the bridge via a high-speed PCI Express link where the following actions are performed:

1. PCI Express overhead is stripped from the data and sent as LVDS/CMOS to the recipi-ents over a user-defined protocol.

2. The recipient answers, the overhead is reinserted into the data and sent to the root complex.

(18)

2.2. Possible solutions The bridge should have the ability to communicate with multiple devices with low la-tency. A sketch of the system is illustrated in Fig. 2.1, where the Bridge block represents the system that handles the PCI Express to LVDS/CMOS conversion.

Figure 2.1: High level sketch of the system.

The PCI Express endpoint requirement greatly limits the number of available solutions since a large portion of available devices only supports root complex mode. Moreover, all alternatives must be able to perform some sort of data manipulation in order to convert data between the interfaces, further limiting the choices.

One idea was to use a PCI Express switch with multiple hardware bridges. A benefit of such a solution is that all interfaces would show up as separate endpoints. However, the concept was discarded since such a solution would be very difficult to reconfigure after construction and the number of hardware bridges that support different interfaces is limited. With all things considered, the pool of solutions was reduced to either a Central Processing

Unit(CPU) or a Field Programmable Gate Array (FPGA).

2.2 Possible solutions

The focus of this section is to describe the alternative solutions and their advan-tages/disadvanteges.

CPU

A CPU is often a good choice when data acquisition and processing is required. They are proven to work reliably and offer high clock-speeds, which is required for the implementation of the bridge. A system sketch of a bridge using a CPU is shown in Fig. 2.2. The light gray block SSI to LVDS/CMOS Bridge symbolizes an optional interface that allows the CPU to communicate with the external contact, if it is not able to do so directly.

Figure 2.2: System sketch of the solution using a CPU with native PCI Express support. Many CPUs allow the use of an operating system such as Linux. This means that there are more alternatives when it comes to programming languages.

(19)

2.2. Possible solutions A drawback is that, while many CPUs support PCI Express, few of them can be used in an endpoint configuration. Two manufacturers that produce endpoint capable CPUs are

Texas Instrumentsand Broadcomm. Another drawback is that CPUs commonly lack LVDS

out-put functionality. Fortunately, many CPUs feature high-speed serial interfaces such as Quad SPI that allow them to be connected to external components that can act as LVDS/CMOS interfaces. The summary of the pros and cons of this system is presented below.

Pros Cons

- Easy to work with. - Limited availability for endpoint capable CPUs

- Features an operating system - Boot time

- Uncommon, limited amount of sources - Real-time system, possibly introduces latency

- Might requre external hardware on LVDS/CMOS side - Complexity (design)

PCI Express to PHY interface

If the desired device does not support PCI Express, an Integrated Circuit (IC) with PCI Express to PHY bridge functionality can be employed between the PCI Express interface and the computing device, as illustrated in Fig. 2.3.

Figure 2.3: System sketch of the solution when using a CPU without native support for PCI Express.

The IC is used as an endpoint device to establish a link with the root-complex. It includes a Serializer/Deserializer (SerDes), consisting of a Parallel In Serial Out (PISO) block and a Serial

In Parallel Out(SIPO) block. The block converts data from a serial to a parallel interface in

both directions.

The use of a PCI Express PHY bridge greatly improves the assortment of available CPUs as the existing layers within the PCIe standard are software based. However, there are a limited amount of PHY bridges and the available options from the researched manufacturers are currently limited to PCI Express Gen 1. The summary of the pros and cons of this system is presented below.

Pros Cons

- Easy to exchange physical layer - Requires PCI Express to PHY IC to operate in

endpoint mode

- Need to implement DLL and TL

- Not possible to upgrade physical layer if manufacturers stop produce new solutions.

(20)

2.2. Possible solutions

Programmable logic

Programmable logic, such as FPGAs are commonly used for high-speed applications and computation acceleration. Many manufacturers offer value-line and low-power FPGAs that are suited for digital processing. The system sketch when using a FPGA is shown in Fig. 4.3.

Figure 2.4: System sketch of the solution when using a FPGA.

One benefit of using a FPGA is that they usually feature a large amount (100+) of GPIOs

and the majority of them support differential termination options as well as single ended ter-mination. Some FPGAs also feature full hardware support for all PCI Express layers and are able to behave as endpoints. Due to their nature, a FPGA without hardware PCI Express sup-port can be programmed to supsup-port it. The requirement is that they are able to communicate at PCI Express speeds, for example with the help of a transceiver. The transceiver are used in conjunction with Intellectual Property (IP) cores that implement the higher layers. While such a solution is highly configurable, it is at the expense of logic elements in the FPGA.

The boot times are determined by the FPGA configuration size as well as the program-ming interface speed. As a result, many smaller FPGAs are able to boot faster than the 100 ms start up time criterion described by the PCI Express Base Specification[1].

Since FPGAs are usually programmed to perform very specific tasks, low and predictable latency is achievable in a FPGA-based system as well as high throughput. Both of these aspects are desirable since the bridge should allow the root complex to communicate with multiple LVDS/CMOS interfaces at high speeds.

The main benefit compared to other solutions is that FPGAs can be programmed to sup-port almost any interface. However, FPGAs lack the rich set of peripherals or software im-plementation of the interfaces.

One design consideration when using a FPGA is that a Non- Volatile Storage (NVS) is re-quired for storing the configuration. Another important consideration is that since the op-erating principle of FPGAs is quite a bit different from MPUs and CPUs, the programming also differs a lot. FPGAs are commonly programmed using Hardware Description Languages (HDLs) such as Verilog or Very High Speed Integrated Circuit HDL (VHDL), both languages dif-fer a lot from sequential programming languages. The summary of the pros and cons of this system is presented below.

Pros Cons

- High number of GPIOs. - Requires external memory.

- Fast boot time. - Less intuitive programming.

- Many manufacturers provide fully functional PCI Express IP cores.

- Low latency and high throughput.

(21)

2.2. Possible solutions

Conclusion

Based on the previous analysis, a few key points can be made to rule out the different solu-tions:

CPUs:Powerful, but are unnecessarily complex and may have long boot times. The lack

of connectivity options also makes this solution poorly suited for the design. Another draw-back is the lack of endpoint-capable CPUs, severely limiting the selection of possible hard-ware.

CPUs without endpoint support suffer from the same drawbacks, with the addition of relying on external PHYs. In short, none of the two CPU solutions are suitable for the appli-cation.

Programmable logic: Programmable logic such as FPGAs may not be as intuitive for

de-velopers who are accustomed to CPU development, they do have a lot of features desired in the application. FPGAs are commonly used in endpoint configuration and many manufac-turers offer endpoint IP cores. The requirement of an external memory is compensated for by the large amount of connectivity options of a FPGA.

With consideration to the information gathered in this prestudy, it was decided that the bridge should be implemented using a FPGA.

(22)

3 Selection of hardware

Multiple manufacturers and product lines were compared to find a suitable FPGA as well as peripheral components. This section will discuss the process of finding, comparing and choosing the different hardware.

3.1 FPGA selection

In order to find the best-suited FPGA for the design, several product series from multiple manufacturers were compared. Four FPGA manufacturers were considered: Intel (previously knows as Altera), Lattice semiconductor, Microsemi and Xilinx. Initially, the different product series were investigated and later specific device packages. In order to qualify as a possible candidate for the initial selection, the product series should feature support for at least PCI Express Gen 1 x2. A list of candidates was compiled and is presented in Table 3.1.

Table 3.1: Product series candidates.

Manufacturer Product family

Intel Cyclone IV Intel Cyclone 10 Intel Arria V GX/GZ Intel Stratix V GX Lattice ECP2 Lattice ECP3 Lattice ECP5/ECP5-5G Microsemi IGLOO2 Microsemi SmartFusion 2 Xilinx Spartan-6 Xilinx Artix-7 Xilinx Kintex-7 Xilinx Virtex-7 XT

Xilinx Kintex Ultrascale Plus

(23)

3.1. FPGA selection

Product family

The device selection process focused on three main aspects: the amount of transceivers, the number of logic elements (LE’s) and the device speed grade. Transceivers are high-speed se-rial/parallel interfaces that operate at a higher frequency than the FPGA fabric. They are essential for PCI Express communications since the fabric itself is usually not capable of han-dling PCI Express speeds.

The device used must have at least two transceiver channels and if the package is a ball

grid array(BGA), it should have a ball pitch equal or larger than 0.8 mm. The reason is that

while a smaller pitch device features more pins in a smaller package, the design tolerances become stricter which in turn increases the PCB production cost.

The number of LE’s determines how much application logic can be implemented on the FPGA. PCI Express IP cores usually take a portion of the logic cells, depending on the

manu-facturer so they need to be taken into consideration1_.

For some manufacturers, speed grade limits performance. Components need to have a speed grade high enough to support PCI Express in order to qualify for the selection. A list was compiled with possible candidates that fulfilled the requirements. The complete list is presented in Appendix A, Table 3.2 shows a summary of the list.

Table 3.2: Selected devices from the different product families.

Product family Device code Footprint LE’s Transceivers

Cyclone IV[6] EP4CGX30CF19C8N F324 30k 6

Cyclone 10 GX[7] 10CX085YU484I6G U484 85k 6

10CX220YF780E5G F780 220k 12

Arria V GX[8] 5AGXMA5G4F35C4G F672 75k 9

Arria V GZ[8] 5AGZME1E3H29C4N H780 220k 12

Stratix V GX[9] 5SGXMA3E3H29C4N F780 340k 12

ECP2[10] LFE2M20E-5FN256C F256 19k 4

ECP3[11] LFE3-17EA-6FTN256C ftBGA256 17k 4

ECP5[11] LFE5UM-25F-8BG381C csBGA381 25k 2

LFE5UM-45F-8BG381C csBGA381 45k 4

ECP5-5G[11] LFE5UM5G-25F-8BG381C csBGA381 25k 2

LFE5UM5G-45F-8BG381C csBGA381 45k 4 IGLOO2[12] M2GL010T-1VF400 VF400 12k 4 M2GL025T-1VF400 VF400 25k 4 SmartFusion2[13] M2S010T-1VF400 VF400 12k 4 Spartan-6[14] XC6SLX25T-N3CSG324C CSG324 24k 2 Artix-7[15] XC7A12T-2CSG325 CSG325 13k 2 XC7A15T-2CSG325 CSG325 16k 4 XC7A25T-2CSG325 CSG325 25k 4 XC7A100T-1FGG484 FGG484 101k 4 Kintex-7[15] XC7K70T-1FBG484 FBG484 65k 4 Virtex-7 XT[15] XC7VX330T-1FFG1157C FFG1157 326k 20

Device selection

The initial seperation was done by putting a price cap at 100 USD/Unit. The limit was chosen since the final design should be as cheap as possible. Any FPGA that was more expensive usually featured higher performance than what was necessary or had poor price-performance ratio. This removed all Intel devices except the Cyclone IV and all Kintex/Virtex devices.

1_{Each manufacturer provides a separate PCI Express IP Core, Intel and Microsemi implements the core in hard} logic so no resources are required[2][3], Lattice requires 12-16k LUTs[4], Xilinx requires 1k LUTs[5, p.10-11].

(24)

3.1. FPGA selection Secondly, devices were removed if they were larger than 22x22 mm since they would not fit on the PCB. Older-generation devices such as the Lattice ECP3, Intel Cyclone IV and

Xilinx Spartan 6were also removed. The reason was that the newer generations are about as

expensive but featured more modern hardware.

The remaining product series were the Lattice ECP5/ECP5-5G, Xilinx Artix-7 and Microsemi IGLOO2/Smartfusion 2. The ECP5 device was removed since it only supports PCI Express gen 1. The Smartfusion 2 device was removed because it is essentially the same device as the IGLOO2 but with an integrated microcontroller which was not needed.

In order to compare the remaining devices, an assumption was made that the bridge would require at most 10K Logic elements (not including the PCI Express interface). Choos-ing the smallest possible device from each family, the choice is narrowed down to three de-vices:

Device Unit price [USD]

Lattice LFEUM-25F-8BGC381C 19

Microsemi M2GL010T-1VF400 37

Xilinx XC7A12T-2CSG325 36

The Lattice device stands out from the rest since it is significantly cheaper compared to the other devices. However, the ECP5-5G family requires a software license for both the IDE and the PCI express IP Core. The IDE cost 895 USD/year and the IP Core is a one-time cost of 1800 USD. It also requires an external PLL for its transceivers, adding around 4 USD to the unit price. The Xilinx and Microsemi both require no license for the IDE or PCI Express IP Core.

Figure 3.1 shows the accumulated price for each of the devices. The comparison assumes 100 units/year, meaning that any yearly licence is required every 100 units.

Figure 3.1: Total accumulated price for N number of units, assuming 100 units/year. As seen in Fig. 3.1, the price break between the Lattice and Microsemi devices are at about 500-600 units. The Xilinx device has a marginally lower cost/device when compared to the others. Since the price difference was so small, any of the devices was a good choice for the design. To be able to make a decision, another aspect had to be taken into account.

The Xilinx Artix-7 XC7A12T-2CSG325 was considered the best candidate. The reason is that the developers where the master’s thesis is carried out all use Xilinx FPGAs, meaning that there is a great knowledge base for both hardware and software. The device naming parameters can be split-up to describe the features of the unit. The name indicates that it

(25)

3.2. Non-volatile memory selection is an Artix-7 Series FPGA with 12K LUTs, speed grade 2 and package type C325 [15, p.16]. It is Xilinx smallest Artix-7 FPGA but features the necessary components to accomplish the assignment. The C325 package is available in many configurations and is easily upgraded due to its pin compatibility with higher-end models of the same family. Speed grade 2 is required to communicate using PCI Express Gen 2.

3.2 Non-volatile memory selection

Since the FPGA fabric commonly consist of volatile memory, FPGAs are unable to retain their configuration when powered off. To resolve this problem, an external Non-volatile memory (NVM) is commonly used to store the configurations when power is lost. Other alternatives include loading the configuration from an external processor, or a hybrid of the two.

For the bridge design, an external memory was used for loading the configuration. The reason is that the physical size of the design is restricted and a processor would not add any extra functionality other than the configuration loading.

Flash memories are either NOR- or NAND-based. The techniques differ vastly in their operation. NOR flash provides sufficient addresses to map the entire memory range, which makes it possible to access random data within the memory. They feature quick read-times but have relatively high write and erase speeds. 100 % bit correctness is ensured, making them excellent to use in a code execution applications. NAND-flash has a high bit-failure and requires error correction to determine if the data is valid. The advantages of NAND-flash is high storage-density and low cost. They are therefore primarily used in applications where large capacity storage is required. The most reasonable flash technique to use as firmware storage in conjunction with an FPGA is, consequently, a NOR-flash.

The flash memory parameters were chosen based on price, capacity, and footprint. MT25QL128 was a satisfactory choice since it features multiple storage sizes and a standard footprint. It communicates using SPI which is directly compatible with the FPGA. NOR-flash with the same pin-out and footprint are available with a capacity up to 256 Mb. The chosen Artix-7 with 12 kLUTs has a Bitstream size of 10 Mb while the 35 kLUTs has a size of 18 Mb [16, p.14] which will easily fit in the NVM.

WISI Norden desired a small I2C EEPROM with device information, tied directly to the SMBus (I2C compatible) interface. The information on the EEPROM should include enough information for the main device to identify the PCI Express Bridge. AT24CXXD is a small and cheap EEPROM with 1-16 Kb capacity in a SOT23-5 package. It complied with the necessary requirements and featured a Write Enable (WE) pin, used to protect the user from accidentally overwriting the memory.

3.3 Regulator selection

The Artix-7 FPGA requires several different voltages to operate. It is usually achieved by a series of Switch Mode Power Supply (SMPS) or Low-dropout Regulator (LDO) ICs. SMPS are highly efficient but induce electrical noise as a side effect. Low-dropout regulators are linear, which effectively means that they convert excessive power to heat by altering the resistance relative to the load. It results in an ineffective power regulator but provides a clean voltage to the load.

Most voltage inputs of the FPGA are tolerant of the noise-levels produced by switching regulators. However, analog voltages related to the transceivers often require isolated and electrically clean power. SMPS are therefore mostly used when high currents are present or where the input to output voltage difference is high. An effective option to achieve clean, yet efficient, power conversion is to use a switching regulator in series with an LDO.

Typically, FPGAs require around 3-4 different voltages, with current draw varying greatly depending on FPGA size and application. It is, therefore, a demanding task to select a

(26)

suit-3.3. Regulator selection able voltage network which efficiently powers FPGAs. Since the current draw of the FPGA is directly related to the resources used in the user application, Xilinx provides a Power

Esti-matorapplication. The application is used to calculate the estimated current draw based on

resources selected by the user.

The number of different voltages is related to the peripherals used within the device. For example, the transceiver channels require an isolated and stable source, while the core and other auxiliary circuits within the FPGA are more resistant to noise and voltage fluctuations. The recommended voltage levels for the Artix-7 FPGA, along with their tolerances and esti-mated current consumption, is presented in Table 3.3.

Table 3.3: Artix-7 - Recommended DC and AC characteristics.

The following values were calculated using: PCI Express Gen 2, 1x2 configuration, I/O Bank: 40 LVDS (2.5V) pairs @ 150 MHz, 100% LUT usage.

Symbol Voltage [V] Current [mA] Tolerance Supply description

VCCINT 1.0 600 ˘3% [17] Internal power.

VCCBRAM 1.0 4 - Block RAM.

VMGTAVCC 1.0 200 ˘10 mVpp [18, p.228] GTP transceivers.

VCCAUX 1.8 100 ˘5% [17] Auxiliary supply.

VMGTAVTT 1.2 150 ˘30 mVpp [17] GTP termination.

V_{CCO_15} 1.14 to 3.465 300 - HR I/O banks.

VCCO_34 1.14 to 3.465 300 - HR I/O banks.

The current estimation is derived from Xilinx Power Estimator (XPE). The combination of high current consumption for the core and low noise requirements for the transceiver periph-erals demands an effective yet flexible power solution. To conform with the requirements, a combination of switching regulators and low dropout linear regulators were regarded as the best solution.

The sensitive and low power transceiver peripherals (VMGTAVTTand VMGTAVCC) are

sup-plied from LDOs which provides an electrically clean output, while the VCCINTand VCCAUX

are supplied from the SMPS. To simplify the task of selecting a suitable power solution, mainly for FPGAs and processors, manufacturers have developed Power Management

Inte-grated Circuits (PMICs). PMICs can refer to any IC which performs power conversion but

generally, they are intended to include all of the necessary SMPS and LDOs within a single IC.

A fitting candidate is the TI LM26480, which is an IC consisting of two switching regula-tors and two low dropout linear regularegula-tors[19]. The key features of LM26480 are:

• Small footprint of 4x4 mm. • Low Price.

• Hardware adjustable voltage levels on all regulators. • Power good indicator.

• 2 MHz switching frequency, allowing the use of small external inductors and capacitors. The voltage and current capabilities of the built-in regulators in the LM26480 can be seen in Table 3.4.

(27)

3.3. Regulator selection Table 3.4: Texas Instruments LM26480 internal regulators.

Regulator type Output voltage [V] Current [A] Voltage drop

Switching 0.8 to 2 1.5

-Switching 1 to 3.3 1.5

-Low Dropout 1 to 3.5 0.3 25 mV (typical)

Low Dropout 1 to 3.5 0.3 25 mV (typical)

The I/O banks of the Artix-7 FPGA may be powered separately from each other. The idea is to power the I/O banks from their own LDO which provides the ability to handle multiple I/O standards. An example is to communicate with a 2.5 V interface on I/O bank 15 while I/O bank 34 runs on 3.3 V simultaneously, enabling the use of different I/O interface standards per bank.

All LDOs suffers from a small dropout voltage, Vdrop, which means that the maximum

output voltage is limited to Vin´Vdrop. This is problematic when a 3.3 V output standard is

desired and Voutneeds to equal Vin. However, Texas Instruments TLV758P solves this issue.

Once Vină(V_out´V_drop), it goes into a "dropout mode", where the output voltage tracks the

input voltage. It features a small footprint, low price and acceptable output current of 300 mA, along with an adjustable output voltage which is configured through a voltage divider.

(28)

4 System Description

This chapter will present an overview of the bridge architecture. The goal is to provide the reader with an easy reference as to how each part of the bridge operates and integrates with other parts. The design is composed of many smaller systems that operate independently or alongside other systems. A rough sketch of an M.2-module with a possible layout of the systems is shown in Fig. 4.1. The complete schematic, layout, BOM and mounting guides are presented in Appendix B.

Each system, left to right, will be presented in a separate section. In general, thicker lines represent buses or groups of signals and thinner lines represent single signals. Gray symbols represent physical parts, such as PCBs and components. Each internal pattern in Fig. 4.1 represents a separate system and the pattern for each section is consistent throughout the chapter.

Note that power nets are only shown in the M.2 and Power management system sketches.

lM.2 connector, FPGA, a Programming and debugging interface, Power management,

External connectors.

Figure 4.1: System layout on the M.2 module.

4.1 M.2 Connector

The M.2 connector serves as the main interface with the platform. It is a finger-edge connector which provides both power and signaling to the bridge. The M.2 connector connects to the

(29)

4.2. FPGA

FPGAsystem, Power management system as well as the programming and debugging interface

system as shown in Fig.4.2.

FPGA, a Programming and debugging interface, Power management.

Figure 4.2: M.2 connector system connections.

The PCI Express bus is the high-speed interface that is used for data exchange between the platform and bridge. It consists of two transmitting lanes, two receiving lanes and a clock lane. The PCI Express bus connects directly from the M.2 connector to the FPGA system.

The platform provides a SMBus on the M.2 connector. Since the platform SMBus operates at 1.8 V logic level and the programming and debugging interface SMBus operates at 3.3 V logic level, a logic level converter is required. The converter needs to convert 1.8 V signals to 3.3 V and vice versa.

The M.2 connector also provides 3.3 V power that is used to generate stable voltages for the bridge. The 3.3 V power from the card edge connector directly connects to the Power

managementsystem.

4.2 FPGA

The FPGA system interfaces with all other systems on the bridge. The main component of the system is the Artix-7 FPGA, which consists of several smaller parts such as transceivers and configuration banks. This section will describe how the FPGA system interfaces with the other systems according to the diagram in Fig. 4.3. A more in-depth description of the Artix-7 and its parts is presented in chapter 5.

lM.2 connector, FPGA, a Programming and debugging interface, Power management,

External connectors.

(30)

4.3. Programming and debugging The SPI interface is used for loading new bitstreams to the FPGA. It is a 1x Synchronous

Serial Interface(SSI), meaning it utilizes a single data channel in each direction. The bus

con-nects to the Programming and debugging system along with the Done and Configure signals. The Done signal is used to verify that bitstreams successfully load into the FPGA and can be read over SMBus. The Configure signal triggers the loading of a new bitstream when pulsed low.

The SET_IO_X_N bus is used by the FPGA to control the LDO which regulates bank voltage for the LVDS/CMOS I/O banks. This is done by pulling the signals low och putting the respective pins in High-Z mode. Chapter 5 describes the process in greater detail.

The LVDS/CMOS bus consists of 10 differential signal pairs. Each pair may be reconfig-ured to function as two independent single-ended signals allowing for a maximum of 20 individually controllable signal pins. The LVDS/CMOS bus connects to the External connectors system.

4.3 Programming and debugging

For the programming interface, different paths for configuration were investigated. The first was to use the PCI Express interface in conjunction with a solution provided by Xilinx called PCI Express tandem. The solution was initially only named tandem and was designed to allow large bitstream devices to load the PCI Express core before loading the rest of the bitstream into the device. The entire bitstream is loaded in two stages from an external NVM and the idea was that the device should fulfill the 100 ms criterion set in the PCI Express specification.

PCI Express tandem allows the user to load the first stage from a NVM and the second

stage to be sent to the device over PCI Express. The solution appeared promising since there was a limited amount of interfaces on the M.2 connector and it would allow easy exchange of the bitstream in the device. Unfortunately, this solution is not available in all Artix-7 devices. The smallest device that supports it is the XC7A75T[5, p.157]. The solution was abandoned since the device is not available in the chosen package.

The second path for configuration was the SMBus present on the M.2 connector. The

interface is not fast enough to allow bitstream loading upon start up1 _{but it can be used to}

load new configurations to an on-board NVM. The FPGA can boot from the flash when it is powered on and if a different configuration is desired, it can be loaded into the NVM and the FPGA reloaded with the bitstream.

The third path of configuration is a JTAG interface. It can be used not only for loading images to the FPGA and NVM, but also provides debugging functionality. The drawback of the interface is that it is not present on the M.2 connector meaning that it will be used mainly for debugging.

Fig. 4.4 shows a sketch of signals present in the programming and debugging system. The

SMBusconnects to a ID NVM. The purpose is to provide a way of identifying the bridge, even

if the system does not boot. It is connected directly to the SMBus with nothing between it and the M.2 connector in order to minimize the risk of failure. The NVM is write-protected and the protection can only be disabled by shorting a test pad on the PCB to ground.

The SMBus is also connected to a SMBus to SPI converter. The SPI bus of the converter is connected to the Normally closed (NC) pins of a quad Single-pole, double throw (SPDT) analog switch. The Normally open (NO) pins of the SPDT is connected to the FPGA and the Common (COM) pins are connected to a NVM. In order to select what device is connected to the NVM, a GPIO on the converter is used to control the SPDT through the IN signal.

(31)

4.4. Power management

lM.2 connector, FPGA.

Figure 4.4: Programming and debugging system connections.

4.4 Power management

The power management system is shown in Fig. 4.5. The Power good signal indicates that all of the PMIC voltages are stable. When pulled low, the LDO and switch are enabled, powering the last sections of the bridge.

lM.2 connector, FPGA, a Programming and debugging interface.

Figure 4.5: Power management system connections.

Power tree

The goal of using SMPS’s and LDOs in conjunction is to minimize the total power loss of the bridge, allowing it to run without active cooling. This section will present two possible hierarchies for the power supplies and discuss the differences between them.

(32)

The first two voltage domains are the 1.0 V VCCINT/VCCBRAM and 1.8 V VCCAUX. All

systems connected to these voltages are larger than 3 % tolerance to noise, as seen in Table 3.3, meaning they can be sourced from the SMPS in the PMIC. In Fig. 4.6, these are represented by regulator 1 and 2.

Since the transceivers in the Artix-7 are sensitive to noise, it is preferable to use LDOs for

their voltage feeds, VMGTAVCC and VMGTAVTT. The two LDOs present in the PMIC will be

used, represented by regulator 3 and 4 in Fig. 4.6.

Note that regulator 5 is omitted from the power loss calculations. This is partly because it is a separate regulator, but also because the current and voltage for the I/O banks may vary greatly depending on the standard used as well as the speed and number of pins. The power hierarchy is illustrated in Fig. 4.6a and 4.6b.

(a) Option A, VMGTAVCC and VMGTAVTT sourced

from 1.8V. (b) Option B, Vsourced from 3.3 V.MGTAVCC and VMGTAVTT

Figure 4.6: Power hierarchy option considered in the design.

Option A and B in Fig. 4.6 mainly differ in the amount of power wasted during operation. An overview of the power loss and current within the LM26480 is presented in Table 4.1 and 4.2.

The SMPS regulator efficiency was chosen as 80 % to simulate a worst-case scenario and to provide some headroom if the current draw estimations are inaccurate. Typically, the

ef-ficiency should be between 85-90 % for SMPS’s. For LDOs, Iinis the same as Iout. Rewriting

the formula for efficiency, E f f , in (4.1), the efficiency of an LDO is given by Vout/Vin.

E f f = Vout¨Iout

Vin¨I_in (4.1)

For SMPS’s, Iinis given by rewriting (4.1) to (4.2):

Iin=

Vout¨I_out

Vin¨E f f (4.2)

Since Iin, Iout, Vinand Voutare now known for each regulator, the power loss can be

calcu-lated using the expected current draw presented in Table 3.3:

Ploss =Pin´P_out= I_in¨V_in´I_out¨V_out (4.3)

Using (4.1) to (4.3), all currents, power losses and the total power loss can be calculated for each of the two options presented in Fig. 4.6. The results for option A and B are presented in Table 4.1 and 4.2, respectively.

(33)

4.5. External connectors

Regulator 1 Regulator 2 Regulator 3 Regulator 4 Total

Type SMPS SMPS LDO LDO

-Vin[V] 3.3 3.3 1.8 1.8 -Vout[V] 1.0 1.8 1.0 1.2 -Iin[A] 0.23 0.31 0.20 0.15 1.14 Iout[A] 0.6 0.45 0.20 0.15 -Efficiency [%] 80 80 56 67 -Ploss[W] 0.15 0.20 0.16 0.09 0.60

Table 4.1: Current flow and power loss in hierarchy option A.

Regulator 1 Regulator 2 Regulator 3 Regulator 4 Total

Type SMPS SMPS LDO LDO

-Vin[V] 3.3 3.3 3.3 3.3 -Vout[V] 1.0 1.8 1.0 1.2 -Iin[A] 0.23 0.07 0.20 0.15 1.25 Iout[A] 0.6 0.1 0.20 0.15 -Efficiency [%] 80 80 30 32 -Ploss[W] 0.15 0.05 0.46 0.32 0.97

Table 4.2: Current flow and power loss in hierarchy option B.

As seen in Tables 4.1 and 4.2, option A is the best when considering power loss. The hierar-chy was simulated using Texas Instruments Tina-TI and the provided model for the LM26480. During simulation, the LDO appeared to reverse feed the SMPS, causing unpredictable be-haviour. Since the design should work and it was unknown if the problem was caused by the simulation model, the final design will include an option that allows switching between the two hierarchies using 0 ohm resistor.

4.5 External connectors

The bridge has two sets of external connectors, each connected to a group of differential pairs as shown in Fig. 4.7. All pairs connect to the FPGA system. The external connectors also feature mounting pads for external termination at the connector since testing equipment may not be terminated correctly.

FPGA

(34)

5 Hardware design and

implementation

This chapter will present the ideas and considerations during the design process of the bridge.

5.1 Stackup and design rules

JLCPCB was chosen for manufacturing the PCB. While they may not have as many options as other manufacturers in their PCB stackup, they simplify the design process by providing tools for calculating line impedance both for single-ended and differential traces for all of their stackups.

In order to ensure that the PCB can be produced, the design has to follow the manufactur-ers rules and use a stackup that JLCPCB can produce. This section will present the stackup and rules that were used for the design.

Stackup

The design uses the JLCPCB 0.8 mm JLC7628 stackup shown in Table 5.1. While the manu-facturer offers other stackups with higher Dielectric constant and thinner prepregs, JLC7628 was chosen due to its thin core, which increases the inter-plane capacitance between the two inner planes where the majority of the ground and power planes are located. The trade-off is a poorer coupling between the outer and inner layers, slightly increasing the width of impedance matched traces.

(35)

5.1. Stackup and design rules Table 5.1: JLC7628 layer stackup.

Layer Material type Thickness (mm) Dielectric constant

Top solder mask Solder mask 0.0127-0.0203 3.8

Top copper Copper 0.035

Prepreg 7628 0.2 4.6

Inner copper 1 Copper 0.0175

Core 0.265

Inner copper 2 Copper 0.0175

Prepreg 7628 0.2 4.6

Bottom copper Copper 0.035

Bottom solder mask Solder mask 0.0127-0.0203 3.8

Design rules

The rules that were used in the design are presented in Table 5.2. The majority are the rec-ommendations from the manufacturer, with the exception of the rules related to differential pairs.

The Differential trace width and Differential pair gap was calculated using the manufacturers online tool, all differential lines on the PCB are designed for a nominal differential impedance of 100 Ohm.

The Length difference within pair rule was chosen small because the line-to-line skew di-rectly influences signal integrity. While perfect length matching is desirable, it is not always possible due to manufacturing defects, etc. The designer should strive for perfect length matching, but realize that minor differences are negligible. Another important note is that any differences that appear as a result of turns or connections should be fixed as close to the source of the difference as possible, as shown by point I in Fig. 5.1. The reason is that the noise rejection of differential signals is best when the two signals are in phase.

The Copper-Differential pair clearance rule ensures that the high speed differential pairs have proper signal integrity. As a rule of thumb, a clearance larger than 2 times the pair gap should be used to ensure that the coupling to nearby copper is negligible. The design has a 3 times larger clearance compared to the gap.

[A] Via hole size [B] Via annular

[C] Copper-Differential pair clearance [D] Differential trace width

[E] Differential trace gap [F] Copper-Copper clearance [G] Routing trace width [H] Solder mask sliver

[I] Length difference within pair Figure 5.1: Illustration of design rules used during layout.

(36)

5.2. M.2 connector Table 5.2: Routing rules used during the design of the bridge.

mil mm

Rule min nom max min nom max

Clearance Copper-Differential pair 12 - - 0.3 -

-Copper-Copper 4 - - 0.1 -

-Trace width Routing trace 4 6 - 0.1 0.15

-Vias Via hole size 8 16 - 0.2 0.4

-Via annular size 18 31 - 0.45 0.8

-Differential pairs Trace gap 4 4.5 5 0.1 0.114 0.127

Trace width 4.96 5.54 6.05 0.126 0.141 0.154

Length difference

within pair - - 5 - - 0.127

Solder mask Sliver 4 - - 0.1 -

-5.2

M.2 connector

As discussed earlier, the SMBus logic level of the platform is lower than the logic level of the programming interface so a level converter must be used. Since SMBus operates on an

open-drain configuration1_{, the circuit in Fig. 5.2 can be used for the conversion. The solution was}

used since it required a small number of components, resulting in a small overall footprint.

Figure 5.2: Open-drain logic level conversion. HV indicates High (3.3 V) logic level, LV indi-cates low (1.8 V) logic level.

Layout

The main consideration during layout was to keep the high-speed traces isolated from other signals such as the SMBus as well as any other high-speed signals in order to maintain signal integrity. For the PCI Express signals, the metal directly below the signals is a ground plane as seen in Fig.5.3. Vias were also added in close proximity to layer changes of the signals to provide a good return path for the signal. Additionally, fencing vias were added around the traces where possible to further shield the high speed signals.

(37)

5.3. FPGA configuration banks

Figure 5.3: Layout surrounding the M.2 connector on the top layer of the PCB. Notice the fencing and return path vias as well as the ground plane (yellow) on the layer below the PCI Express lanes.

5.3 FPGA configuration banks

The configuration banks refer to Bank 0 and 14 in the Artix-7. These banks are used for configuration of the FPGA as well as setting the voltage levels for the I/O banks.

The JTAG interface on Bank 0 is used for debugging and loading new bitstreams to the NVM through the SPI interface. This interface has a separate connector on the PCB and is not connected to the M.2 connector.

The SPI interface is used for loading bitstreams from the NVM. The default interface is set

through the Mx pins on Bank 0. In the design, M[2..0] =001, corresponding to Master SPI[16,

p.21]. This interface is used at power-on, but may be overridden by the JTAG interface. Three IOs on Bank 14 are used to control the voltage level of the remaining I/O banks. This is achieved by altering the feedback loop of an LDO by pulling the respective IOs low or putting them in a high-Z state. This is described in greater detail in section 5.6.

Layout

Since most of the configuration pins on the Artix-7 have fixed positions in the pinout, the main consideration was to avoid layer changes as much as possible. Examples of how the nets were fanned out under the Artix-7 are shown in Fig. 5.4. The LDO control pins that had no fixed position were moved close to the LDO to make routing easier.

(38)

5.4. FPGA I/O banks

5.4 FPGA I/O banks

The I/O banks refer to Bank 15 and 34 in the Artix-7. These banks are both connected to the LVDS/CMOS connectors. Each bank has 10 pins available at the connector that may be used as either 10 single-ended pins or 5 differential pairs. Due to space constraints, one of the pairs on Bank 34 was only routed to a terminating resistor close to the Artix-7.

Layout

All pairs were routed and pairwise length-matched within 0.127 mm (5 mil) with a nominal differential impedance of 100 Ohm. The clearance between pairs was kept to a minimum of 3 times the spacing within a pair to reduce cross-talk.

5.5 FPGA transceivers and PCI Express nets

The PCI Express lanes 0 and 1 of the M.2 connector were connected to the transceivers of the Artix-7. To simplify the routing, lane 0 was connected to transceiver 1 and lane 1 to transceiver 0.

To reduce the complexity and number of external components of the system, the PCI Express reference clock will not only be used for the transceivers, but also to clock the core logic of the FPGA.

For the PCI Express lanes, the PCI Express Base specification states that any transmitting ele-ment should have AC coupling capacitors in the range 75 ´ 265 nF for 5 GT/s applications[1, p.357]. 100 nF capacitors were added to the TX and CLK lanes.

Layout

Since the PCI Express lanes and reference clock are both differential signals, special consid-eration must be taken to their relative length. The most important considconsid-eration is the differ-ence between the positive and negative signal inside a lane, since it directly infludiffer-ences signal integrity.

The largest allowed lane-to-lane skew is 1.3 ns for TX and 8 ns for RX when UI is the nominal unit interval 200 ps[p.357,p.378,p.380][1]. Assuming a phase velocity of 0.67 c, this corresponds to 18/160 cm for TX/RX, respectively. TX lanes are completely independent from RX lanes, so they need not be matched. The characteristic impedance for all lanes are 80-120 Ohm according to the PCI Express base specification[1, p.355, p.379]. During layout, pairs were designed to the nominal differential impedance 100 Ohm.

5.6 Power management

The Artix-7 datasheet provides the optimal power-on sequence[20, p.8] to minimize current consumption. Turn-on sequencing of the regulators was, therefore, added to minimize FPGA start-up current draw. The sequencing was accomplished by RC networks connected to the regulator enable pins, seen in Fig. 5.5, where the delay is determined by the rise-time of the supply voltage, calculated using (5.1). The optimal turn-on sequence is shown beside the designed turn-on sequence in table 5.3.

(39)

5.6. Power management Table 5.3: Optimum and designed power-on sequence for the different power nets.

Optimum Design

1 VCCINT VCCINT, VCCBRAM

2 VMGTAVCC VCCAUX

3 VGTAVTT VMGTAVCC

4 VCCBRAM VGTAVTT

5 VCCAUX VCCO

6 VCCO

The designed sequence does not follow the optimum sequence but intends to solve the

most critical parts which are the transceiver analog supply voltage, VMGTAVCC, and its

termi-nation voltage, VMGTAVTT.

Figure 5.5: Power-on sequencing using RC networks.

The nPOR pin of the PMIC is a power-good indicator. It is driven low when either of the buck regulators is below 92 % of their desired output voltage. When the output is considered stable, there is a 60 ms delay until nPOR is pulled high. The power-good signal is used to enable the two FPGA I/O bank voltages through an N-MOS transistor and an enable signal, illustrated in Fig. 5.6.

(40)

Figure 5.6: I/O voltage sequencing and selectable I/O voltage solution.

I/O voltage regulation

FPGA I/O voltage is regulated by a stand-alone adjustable LDO. The adjustability is achieved using software controlled feedback networks, illustrated in Fig. 5.6. The voltage of the I/O regulator is controlled by resistor dividers, where the voltage is calculated using (5.2).

Vout,LDO=Vre f¨(1+ R1

R2), Vre f =0.55 (5.2)

The divider network essentially alters the R2 value. Since the GPIOs are tri-state capable, a single resistor divider can be selected by setting one output to LOW while the rest are high impedance (High-Z), resulting in the pattern described in Table 5.4.

Table 5.4: Regulator SET_IO state to voltage relation.

SET_IO_1V8_N SET_IO_2V5_N SET_IO_3V3_N VCCO_15_34_ADJ[V]

High-Z High-Z High-Z 1.2

High-Z High-Z LOW 1.8

High-Z LOW High-Z 2.5

LOW High-Z High-Z 3.3

Passive components

The PMIC circuit and selection of external components was based on the Typical Application design provided in its datasheet. The datasheet also presents a formula for calculating the output voltages of the four individually controlled regulators, seen in (5.3) [21].

Vout=Vre f¨ R2

R1+R2, Vre f =0.5 (5.3)

External components were selected to suit the output voltages and their approximated current draw. Special consideration were taken into capacitor DC-bias and inductor current saturation. The ripple and peak currents of the SMPS are highly dependent on the output

(41)

5.6. Power management filtering components. Equations (5.4)-(5.8) are provided by TI[21, p.27]. They allow the de-signer to predict the current and voltage ripple based on filtering components selection, ex-pected current draw and switch frequency. Table 5.5 shows the current ripple that can be presumed for the design.

I_L,peak=Iout,max+Iripple₂ (5.4)

Iripple= Vout Vin¨ η ¨Vin ´V_out L ¨ F (5.5) VCout= Iripple 8 ¨ F ¨ Cout (5.6)

VRout=Iripple¨ESRCout (5.7)

Vout,p´p=

b

V_Cout2 +V_Rout2 (5.8)

Table 5.5: Calculated values for Peak Current through the inductor and voltage ripple on the

output, using: Vin=3.3V, η=0.85, Iout,max =0.8A, Cout=10µF, L=2.2µH, f =2MHz.

Parameter Condition ESRCout[Ω] Unit

0.1 0.05 0.01

IL,peak V_Vout=1.0V 0.89 A

out=1.2V 0.90

Vout V_Vout=1.0V 18.67 9.39 2.20 mV

out=1.2V 20.46 10.29 2.41

Another aspect that was considered was the effect of DC-bias on the capacitors in the system. DC-bias causes the capacitance of a capacitor to decline. The extent of the effect varies greatly between different types of capacitors but a ceramic X5R capacitor usually has a curve similar to Fig. 5.7[22]. The main difference is where the capacitance starts declining rapidly, this is not necessarily related to the voltage rating of the capacitor. To reduce the overall amount of different components in the system, any capacitor used in the design had to maintain at least 90 % of its nominal capacitance at 3.3 V. This makes it possible to use the same value capacitors anywhere in the design.

Figure 5.7: Characteristic change of capacitance according to DC-voltage.

The Equivalent Series Resistance (ESR) of the capacitors was also examined. However, since all capacitors were of relatively small capacitance, ceramic capacitors with naturally low ESR were selected. The ESR could, consequently, be neglected in this application.

(42)

5.7. Programming and debugging interface

Layout

The four copper layers of the PCB were divided into multiple sections to distribute the volt-ages efficiently. Power pins of the FPGA with the same voltage are often grouped together in the footprint, illustrated in Fig. 5.8.

VCCINT,VMGTAVTT,VMGTAVCC,VCCO,VCCBRAM,VCCAUX,GND

bBank 0,bBank 14,bBank 15,bBank 34,bBank 216 (Transceivers)

Figure 5.8: Artix-7 FPGA power pin mapping.

A suitable solution was to form a number of voltage planes confined in the inner layers of the PCB. Sections where a specific voltage was frequently used, were made larger to max-imize inter-plane capacitance to adjacent ground planes. Vias connect the inner and outer layers together to distribute the power between connectors and components. Ground planes on multiple layers provide a low impedance connection to the regulator.

The layout of the PMIC and the LDO originates from the Layout Examples provided by TI [23][21].

5.7 Programming and debugging interface

There are two parts tied to the programming interface. Firstly, JTAG, which is connected di-rectly to the programming pins of the FPGA. It is used for configuration as well as debugging. Secondly, the SMBus interface, which is used to program the NVM through a SMBus to SPI bridge. Upon start-up of the FPGA, it reads the NVM and loads the stored bitstream into its volatile memory. It was desired to have the possibility to program the memory through the SMBus, so that the bridge bitstream may be updated through the platform it is connected to.

Design and implementation of a high-speed PCI-Express bridge

LiU-ITN-TEK-A--19/012--SE

Design and implementation of a

high-speed PCI-Express bridge

Mandus Börjesson

Håkan Gerner

LiU-ITN-TEK-A--19/012--SE

Design and implementation of a

high-speed PCI-Express bridge

Examensarbete utfört i Elektroteknik

vid Tekniska högskolan vid

Linköpings universitet

Mandus Börjesson

Håkan Gerner

Handledare Qin-Zhong Ye

Examinator Adriana Serban

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

Acknowledgments

Contents

List of Figures

List of Tables

Abbreviations

1

Introduction

1.1

Aim

1.2

Background

M.2 connector

The PCI Express protocol

Layers

1.3

Research questions

1.4

Design requirements

PCI Express interface

LVDS/CMOS interface

Physical size and characteristics

Power dissipation

1.5

Delimitations

2

Prestudy

2.1

System sketch

2.2