Partial Reconfiguration of a CPRI Implementation on an FPGA

(1)

IN

DEGREE PROJECT ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2018,

Partial Reconfiguration of a CPRI Implementation on an FPGA

ALFRED SAMUELSON

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

(2)

c Alfred Samuelson, 27 February 2018

(3)

Abstract

Utilizing Partial Reconfiguration (PR) in Field Programmable Gate Arrays (FPGAs) is a digital hardware design concept that has gained in popularity and ease of implementation over the past decades. In short, it means that a limited region of the FPGA is reconfigured during run-time depending on which logic is needed at a given time. This way, the logic utilization of the FPGA can be reduced while still maintaining the same functionality in designs where certain logic blocks are not run in parallel. For example, it has previously proven to be useful in designs containing several types of hardware accelerators which are used by a Central Processing Unit (CPU).

Common Public Radio Interface (CPRI) is a communication interface between components of a Radio Base Station (RBS); Radio Equipment (RE) and Radio Equipment Control (REC). The specification of the interface outlines a functional split between two different layers. In this master’s thesis, the potential benefits and challenges of applying the concept of Partial Reconfiguration to a CPRI layer 2 FPGA design are investigated. Using an Intel Arria 10 development board, a platform has been designed for evaluation of relevant parameters with focus on resource utilization, bitstream file size and reconfiguration time.

The results do not show clear benefits of utilizing PR in this particular block, mainly due to the fact that not a large reduction of logic utilization is achieved compared to a reference implementation of the block where PR is not utilized.

However, important insights for future work on PR implementation of similar circuits have been obtained.

i

(4)

(5)

Sammanfattning

Partiell Rekonfiguration (PR) i Field Programmable Gate Arrays (FPGAs) är ett designkoncept för digital h˚ardvara vars populäritet har ökat de senaste decennierna samtidigt som implementationsverktygens stöd för metodiken har förbättrats.

Sammanfattningsvis innebär det att en begränsad region av FPGAn rekonfigureras beroende p˚a vilken logik som behövs vid en given tidpunkt, samtidigt som resten av designen körs. P˚a detta sätt kan FPGAns logiska utilisation reduceras med bibeh˚allen funktionalitet i designer där vissa logikblock inte körs parallellt. Det har till exempel visat sig vara användbart i designer där flera olika typer av h˚ardvauacceleratorer används av en central behandlingsenhet (CPU).

Common Public Radio Interface (CPRI) är ett gränssnitt för kommunikation mellan komponenter i en radiobasstation. Vanligen handlar det om radioutrustning och den komponent som kontrollerar radioutrustningen. Specifikationen för CPRI p˚avisar en funktionell uppdelning mellan tv˚a olika lager. I detta examensarbete undersöks de potentiella fördelarna och utmaningarna med att applicera konceptet Partiell Rekonfiguration p˚a en FPGA-design av en CPRI lager 2-krets. En plattform baserad p˚a ett utvecklingskort för Intel Arria 10 utformas för att utvärdera relevanta parametrar med fokus p˚a resursutnyttjande, storlek p˚a bitström- filerna samt rekonfigurationstid.

Resultaten visar inte p˚a klara fördelar med att använda PR för just detta block, framförallt eftersom inga stora besparingar i logikutilisation uppn˚addes jämfört med en referensimplementation av samma block som ej använde sig av PR. Användbara insikter för framtida arbete p˚a PR-implementation av liknande kretsar har dock förvärvats.

iii

(6)

(7)

Acknowledgements

The author would like to sincerely thank the thesis supervisors at Ericsson (Emil Lundqvist) and KTH (Kalle Ngo) for invaluable support, insights and interesting discussions in a wide range of areas along the course of the thesis project. A special thanks to Martin Nilsson at Ericsson for implementation specific support, and to M˚arten Kidd at Intel for providing the development board. To the fellow master thesis students at the Baseband Interconnect unit of Ericsson: thank you for many laughs, good conversations and interesting insights into your thesis topics.

Thank you also to Pierre Rohdin and Yousaf Gulzar at Ericsson for providing the author with the opportunity to write this thesis. Last but not least, the author would like to thank the examiner Johnny ¨Oberg for offering great input in shaping the thesis topic as well as providing the necessary prior knowledge within FPGA design through his class at the department of Electronic System Design at KTH.

v

(8)

(9)

List of Figures

2.1 CPRI system example . . . 6

2.2 Several REs connected to one REC (star topology) . . . 7

2.3 Several RECs serving one RE . . . 7

2.4 Several REs cascaded (chain topology) . . . 8

2.5 REC/RE tree topology . . . 8

2.6 REC/RE ring topology . . . 9

2.7 Overview of the CPRI frame substructures. . . 11

2.8 An overview horizontal slice of an Arria 10 FPGA . . . 20

2.9 ALM architecture overview . . . 21

4.1 CPRI data flow through different layers . . . 29

4.2 Block diagram of CPRI L2 block . . . 30

5.1 Overview of the system implemented during this thesis project. . . 36

5.2 Overview of the DMA/PR IP block designed for this project. . . . 38

6.1 Resource utilization diagram . . . 41

6.2 Chip Planner overview . . . 43

6.3 Chip Planner detailed view . . . 44

6.4 Bitstream file size statistics . . . 46

6.5 Reconfiguration time with a PR region measuring 30x40 cells . . 48

xi

(14)

(15)

List of Tables

2.1 Word length and Control word length for different line rates . . . 12 4.1 Overview of the PR personas and their settings. . . 33 5.1 Settings when instantiating the PR IP component in Qsys Pro 16.0 37 6.1 Bitstream file size statistics . . . 45

xiii

(16)

(17)

List of Abbreviations

ASIC Application Specific Integrated Circuit

AVMM Avalon Memory-Mapped

BER Bit Error Rate

CBR Constant Bit Rate CFI Compact Flash Interface CoE CPRI over Ethernet

CPRI Common Public Radio Interface CPU Central Processing Unit

CR Cognitive Radio

DPD Digital Pre-Distortion DMA Direct Memory Access FEC Forward Error Correction FFT Fast Fourier Transform

FPGA Field Programmable Gate Array GSM Groupe Sp´ecial Mobile^∗

IP Intellectual Property

L2 Layer 2

LTE Long Term Evolution^†

MC-CDMA Multi Carrier Code Division Multiple Access OFDM Orthogonal Frequency Division Multiplex PSK Phase Shift Keying

QAM Quadrature Amplitude Modulation RAN Radio Access Network

RAT Radio Access Technology

RE Radio Equipment

REC Radio Equipment Control

RX Receive

SDR Software Defined Radio SNR Signal-to-Noise Ratio

TCL Tool Command Language

TX Transmit

UMTS Universal Mobile Telecommunications System xv

(18)

(19)

Chapter 1 Introduction

The concept of Partial Reconfiguration (PR) is a rather intuitive utilization of the reprogrammability attribute of Field Programmable Gate Arrays (FPGAs). The idea is to alter the configuration of the FPGA during runtime depending on the required functionality at a given moment [1]. This is particularly useful when only certain blocks are active for each operating mode of a device. Potential benefits of PR include the reduction of size, cost and power consumption of the FPGA.

1.1 Problem

Ericsson’s current Baseband FPGA designs are based on an “All features approach”, meaning that all functionality that could potentially need to be accessed in the field for a certain FPGA are included in the design. The Common Public Radio Interface (CPRI) [2] links in the baseband circuits can be taken as an example.

These need to be able to operate at different line rates (and sometimes use different protocols) depending on the radio unit on the other side of the communication line, which gives rise to physically very large FPGA designs with drawbacks including:

• Long compile times

• High efforts for meeting timing closure

• High power consumption

• High hardware cost since big, premium FPGAs that are generally not produced in large quantity have to be purchased from vendors.

1

(20)

2 CHAPTER 1. INTRODUCTION

1.2 Purpose

The purpose of this thesis is to evaluate the potential benefits and challenges of using Partial Reconfiguration in FPGA design, as well as to document the design flow for a particular hardware. The main focus will be on Ericsson’s baseband designs and work flow, particularly concerning the CPRI links. These are a good candidate for evaluation of the PR concept in terms of features and usage (see Section 2.1.2.6). The purpose is not to achieve a PR-based CPRI design actually running in hardware, but rather to compare relevant output from the design tool.

The thesis mainly focuses on the following metrics: reconfiguration time, logic utilization and bitstream file size. Other metrics such as power consumption and effects on efforts for timing closure would also be of interest but were deemed outside of the scope of this project since they could not be analyzed from the compilation reports.

The documentation of the hands-on experience of the design flow is intended for later use as internal reference at Ericsson.

1.3 Relevance

The literature study outlined in Section 2.3 has shown that the aim of this thesis, namely to utilize PR in CPRI line rate configuration, fits nicely into a hole of the puzzle of the current state of the art of radio communication equipment. As can be seen in that section, there are many other application areas related to CPRI within telecommunication where PR has been applied. However, no other study has so far (to the knowledge of the author) explored a PR implementation of CPRI.

1.4 Structure of the Thesis

This thesis is structured as follows:

• Chapter 2 provides insight into topics that are necessary to understand in order to follow the rest of the thesis properly, namely CPRI, PR, Related Works and information about the logic resources available for PR on Arria 10 devices.

• Chapter 3 goes on to explain the PR work flow on Intel Arria 10 devices.

Design considerations as well as a step-by-step presentation of the flow are given here.

• Chapter 4 goes into more detail about the logic block chosen for study during this thesis, namely Layer 2 of the CPRI block.

(21)

1.4. STRUCTURE OF THE THESIS 3

• Chapter 5 describes the implemented PR design.

• Chapter 6 outlines the results of the measurements done on the implemented design.

• Chapter 7 analyses the results and the PR work flow.

• Chapter 8 provides conclusions as well as suggestions for future work.

(22)

(23)

Chapter 2 Background

2.1 Common Public Radio Interface

2.1.1 Overview

CPRI is an industry cooperation between several major telecom equipment companies.^∗ The purpose of the cooperation is to provide a standardized serial communication interface between the Radio Equipment (RE) and Radio Equipment Control (REC) in Radio Base Stations. RE and REC are often separated due to their different roles in the system, as well as the fact that they may be located on different geographical locations depending on the system architecture. In short, the most common design approach (at least historically) is that the REC handles the radio functions of the digital baseband domain and the RE handles the analog frequency functions. Thus, for a given application, it might be desirable to use RE and REC from different vendors and/or different generations of technology. A standardized communication interface enables such flexibility as well as independent technology evolution of REC and RE.

∗ Ericsson, Huawei, NEC, Alcatel and Nokia

5

(24)

6 CHAPTER2. BACKGROUND

2.1.2 System Description

2.1.2.1 System Components

Layer 2 Layer 1

Radio Equipment Control (REC)

Control &

Mgmt. Sync. User

Network Interface

Layer 2 Layer 1 Radio Equipment (RE)

Control &

Mgmt. Sync. User Antenna

Interface

Digitized Radio Base Station Internal Interface Specification

Figure 2.1: Overview of a system containing an RE and an REC unit as well as a CPRI link connecting them.

As stated in the CPRI Specification [3], “the RE provides the analogue and radio frequency functions such as filtering, modulation, frequency conversion and amplification.” The REC, on the other hand, “is concerned with the Network Interface transport, the Radio Base Station control and management as well as the digital baseband processing.” The CPRI specification covers only the point- to-point communication interface between two nodes, which can be of either the same or different type (RE or REC). However, each Radio Base System must contain at least one of each type. The interface specification enables a variety of different parallel as well as chained topologies of two or more nodes.

2.1.2.2 Example Configurations

The CPRI specification [3] outlines a number of reference configurations for interconnection of RECs and REs, of which some examples are given here in Figures 2.2 through 2.6. The most basic configuration, namely a point-to-point connection between an REC and an RE, can be seen in Figure 2.1. It should be noted that even though a limited number of configurations are explicitly given

(25)

2.1. COMMON PUBLICRADIOINTERFACE 7

in the specification, no other type of configuration is precluded given that the implemented system components have sufficient functionality.

RE

REC

RE

. . . . . .

Figure 2.2: Several REs connected to one REC (star topology)

REC

RE CPRI Link(s)

REC

CPRI Link(s)

Figure 2.3: Several RECs serving one RE

(26)

RE

REC CPRI Link(s) CPRI Link(s) RE

. . .

Figure 2.4: Several REs cascaded (chain topology). Cascading of RECs is also possible.

RE

REC

RE

. . .

RE CPRI Link(s)

Figure 2.5: Tree topology

(27)

RE

REC

CPRI Link(s) CPRI Link(s) RE

CPRI Link(s)

Figure 2.6: Ring topology

2.1.2.3 Protocol Layers

As visualised in Figure 2.1, the CPRI interface specifies two layers. Layer 1 is the physical layer and defines properties such as electrical and optical characteristics, time multiplexing of the different data flows and low level signalling. Layer 2 is a data link layer and defines media access control, flow control and data protection of the control and management information flow. The main motivation for the CPRI specification focusing on these two hardware dependent layers was to ensure hardware compatibility in order to facilitate independent technology evolution on both sides of the interface while not limiting product differentiation in higher and parallel layers of the RE or REC as a whole [3].

2.1.2.4 Protocol Data Planes

The data flow is divided into four groups:

Control Plane: Contains control data for call processing

Management Plane: Contains management information for the CPRI link system itself

User Plane: The actual user data that is to be transferred/received by the Radio Base Station (usually in the form of IQ data) Synchronization: Used for synchronization and timing between nodes These data flows are time division multiplexed by Layer 1 when sent over the CPRI link. The connections going into the top of Layer 2 in Figure 2.1 are called Service Access Points (SAPs). These can be used as reference points for

(28)

performance measurements. As can be seen, the Control and Management data planes share a SAP while the other data planes have one each.

2.1.2.5 Interconnection

CPRI supports optical as well as electrical interconnection. A CPRI link is a bidirectional interface in between two directly connected ports, using one transmission line per direction. One port acts as master and the other as slave.

In the case of a single connection between a REC and an RE, the REC port shall act as master.

2.1.2.6 Signal/Data Transfer

CPRI is a serial interface that uses time division multiplexing of the different data flows. A wide range of line bit rates, from 614.4 Mbit/s to 24.3 Gbit/s, are included in version 7.0 of the specification. However, the specification only specifies that each CPRI compliant RE or REC shall support at least one of the available line bit rates. Furthermore, two different bit rates are available for the Control and Management channel; one slower adhering to the High-Level Data Link Control (HDLC) protocol and one faster adhering to the Ethernet protocol. The actual bit rate of the Control and Management channel will depend on the line bit rate.

On startup, the two nodes perform negotiation in order to synchronize and agree on protocol parameters, starting with the line bit rate controlled by Layer 1 and moving up to higher-level parameters. It is necessary that each node in the link supports at least one protocol configuration that is compatible with at least one on the other side.

Note: Since the line bit rate is controlled by the hardware layers described in the CPRI specification, partially separate hardware circuits are necessary for each line bit rate supported by the RE or REC. This is a potential area of improvement using Partial Reconfiguration of CPRI links implemented in an FPGA; instead of loading all the necessary logic into the FPGA, only the logic corresponding to the current line bit rate is loaded during negotiation as well as operation.

2.1.3 CPRI Frame Structure

CPRI is a Constant Bit Rate (CBR) protocol, meaning that data is sent continuously with a certain interval. This differs from, for example, Ethernet, which is a packet- based protocol that sends data sporadically and with different time intervals depending on the workload. Defining CPRI as a CBR protocol makes it easier to adhere to the strict timing and synchronization requirements outlined in [3],

(29)

which are stipulated in order to ensure robust streaming of the IQ data between the REC and RE.

CPRI data is transmitted serially and arranged in a hierarchical frame structure with three levels: Basic frames, Hyperframes and CPRI frames. The structure and timing is designed to match the LTE frame structure in order to simplify the translation and encapsulation of LTE data, but CPRI can of course also be used for other Radio Access Technologies (RATs). The frame structure is visualised in Figure 2.7 on the following page.

CPRI Frame CPRI frame CPRI Frame

#0 Hyperframe #149

#0 Basic frame #255

16 words

Nr of bytes/word depends on line rate One byte

. . . .

. . .

. . . .

. . .

First word

= control word

Figure 2.7: Overview of the CPRI frame substructures.

• A basic frame consists of 16 words with word length depending on the line rate (see Table 2.1). Creation and transmission of one basic frame to the other side of the CPRI link is completed once every T_c = 240.416 ns. This is based on the Universal Mobile Telecommunications System

(30)

(UMTS) clock rate which is 3.84 MHz. For example, this value of T_c is suitable for transporting one Fast Fourier Transform (FFT) sample for an LTE channel bandwidth of 2.5 MHz, as outlined in [4].

It should be noted that the data is transmitted serially on a single port bit by bit and byte by byte. Thus, the bytes within a word organized vertically in Figure 2.7 are transmitted in serial within Tc, and higher line rates allow for more bytes to be transferred during the same time period.

• A hyperframe consists of 256 basic frames, and thus creation and transmission of one hyperframe is completed once every 256 × T_c = 66.47 µs.

• A CPRI frame consists of 150 hyperframes, which means that creation and transmition of one CPRI frame is completed every 10 ms. This corresponds to one LTE frame.

Table 2.1: Word length and Control word length according to the CPRI specification [4] for different line rates

CPRI line bit rate (Gbit/s) Word length (bits) Control word length (bits)

0.6 T=8

T_cw=T

1.2 T=16

2.5 T=32

3.1 T=40

4.9 T=64

6.1 T=80

8.1, 9.8 T=128

10.1 T=160

T_cw=128

12.1 T=192

24.3 T=384

A basic frame always consists of 16 words, but the word length varies depending on the line rate. Table 2.1 shows the word length for the different line rates supported in version 7.0 of the CPRI specification. The first word of each basic frame is a designated control word, meaning that each hyper frame contains 256 control words. These are organized into different sub-channels used to carry the Control & Management as well as Synchronization data. The remaining 15 words of each basic frame is used to carry User plane IQ data.

(31)

2.2. PARTIALRECONFIGURATION OF FPGAS 13

2.1.4 CPRI over Ethernet

CPRI has been a successful industry cooperation and standardization project, allowing for interconnection between radio equipment from different vendors.

However, one idea that has been suggested (for instance in [5]) in order to bring down the cost of both deployment and management as well as to offer more configuration flexibility is to encapsulate the CPRI protocol within a physical Ethernet connection. This approach is called CPRI over Ethernet (CoE). The physical interfaces for Ethernet are more widely available and less costly than the current CPRI interfaces. However, this approach introduces some considerations, especially when it comes to whether or not the jitter can be kept low enough to adhere to the CPRI standard. In one study from 2015, the feasibility of CoE is assessed [6]. The results in that report are not quite conclusive, but a later study which came out in 2017 argues strongly that it is indeed possible to meet the necessary requirements [7].

2.2 Partial Reconfiguration of FPGAs

2.2.1 Overview

Field Programmable Gate Arrays (FPGAs) are reprogrammable digital electronic hardware chips. Development of digital electronic circuits usually involves writing Hardware Description Language (HDL) code, which can then either be implemented by manufacturing a static Application Specific Integrated Circuit (ASIC) or synthesized to configure an FPGA. ASICs offer lower power consumption, more compact design and lower cost per chip when manufactured on a large scale.

However, the development cost is very high and it is not possible to update the deployed hardware after manufacturing.

The reprogrammability of FPGAs have made them popular as prototyping and educational tools. They can also be placed together with ASICs on circuit boards in order to offer some added flexibility and ability to update parts of the design later on. Furthermore, if a digital circuit is not going to be manufactured on a large scale, full FPGA implementation can often yield a lower total cost.

The concept of Partial Reconfiguration (PR) means that only a small region of the FPGA chip is reconfigured. Partial Reconfiguration often also implies Run- Time Reconfiguration (RTR), meaning that the PR region is reprogrammed while the rest of the design (the static region) is still running. Another term sometimes used is Dynamic Partial Reconfiguration (DPR). In this report, PR will be used and will also imply RTR.

PR makes it possible to utilize the full potential of the reprogrammability feature of FPGAs to be utilized. Both duration and power consumption of

(32)

the reconfiguration phase is greatly reduced when compared to reconfiguring the whole chip in order to alter the functionality. Furthermore, there is no duration of time when the whole chip is unavailable, which greatly simplifies the integration with other components since the static region can handle the necessary communication mechanisms and signalling during the reconfiguration phase. With PR, it is possible to construct hardware circuits that adapt to dynamic conditions. It combines the flexibility of software with the speed and reliability of hardware.

Some FPGA designs are more suitable for PR implementation than others.

In general, designs that offer a range of different modes of operation, and where different logic is used for the different modes, have the highest potential when it comes to resource savings using PR. Drawbacks such as overhead in terms of supporting logic and reconfiguration time (see Section 3) must be taken into consideration when deciding whether to use PR in an implementation or not.

2.2.2 History

Ideas about reconfigurable computing hardware can be found in publications written as early as [8] and [9], both from 1978. In those articles, the idea was to increase computation efficiency when the word length of the input was less than the full word length of the processor. Although the idea of dynamically adaptive hardware is related to PR, the actual implementation differed from the method discussed in this thesis.

During the last few decades, FPGA technology and PR capability has developed rapidly along with the rest of the electronic industry. During 2000- 2010, a lot of research was made in order to develop reliable implementation techniques and design frameworks for FPGA designs utilizing PR. A few examples of publications from that time period are:

• J.H. Pan et alias’ IEEE conference paper from 2004 about a technique for compressing the bitstream used to program the PR region [10].

• The article from 2005 by Cindy Kao in Xilinx’s journal XCell about general benefits of PR [11].

• S. Liu et alias’ technical report from 2009 about an approach to reducing the reconfiguration time overhead by utilizing a Direct Memory Access (DMA) streaming engine found in [12].

• M. Liu et alias’ conference paper from 2009 which is also about reducing the reconfiguration time overhead but this time by utilizing parallel PR

(33)

2.3. RELATED WORKS AND APPLICATION AREAS 15

regions and virtual configurations, albeit at the cost of added design size [13].

• Another conference paper from 2009 headlined by M. Liu but with different co-authors, describing a design framework that takes aspects such as hardware processes, system interconnections, Operating Systems (OS), device drivers, scheduler software and context switching into consideration[14].

For quite some time, PR was not supported by all FPGA vendors. Furthermore up until some years ago the design process in the tools available was rather cumbersome. During the second half of that decade, more reports that targeted actual use cases rather than development of the concept of PR itself started surfacing. A few examples targeting the telecom industry are [15], [16] and [17].

The use cases mentioned in these reports will be elaborated on in Section 2.3.

2.3 Related works and application areas

2.3.1 PR in General

One significant piece of work is the PhD thesis [18] from 2011 by Dr. Ming Liu. It contains a thorough investigation of design methodology, application areas, potential and limitations of PR. The main focus is to develop hardware for particle physics experiments, but the paper also includes case studies and designs that can be useful in other application areas. For example, a PR design including one configuration acting as a controller for an external SRAM memory and another acting as a controller for an external flash memory was implemented. This is a somewhat similar use case to the one investigated in this thesis, since they both discuss peripheral communication operating exclusively in a single mode at any given time. Liu’s results in regards to resource savings are encouraging.

For example, 43.7% of the 4-input Look-Up Tables (LUTs) were saved when using PR instead of a static design implementation containing both controllers and switching mechanisms between the two. One significant difference between the approach in [18] and the approach in this thesis is that software was used for handling the reconfiguration flow in the former, while the latter aims to implement the whole reconfiguration flow in hardware. Some of the articles published during the work on the PhD thesis are [14], [12] and [13].

Another PhD thesis which has significantly contributed to the knowledge in the field is [19] from 2015 by Dr. Byron Navas. Here, a platform for easy implementation of PR designs is described and applied to interesting areas such as self-healing and cognitive Systems on Chip (SoC). A shorter insight into the RecoBlock system is provided in [20].

(34)

In the following subsections, previous work in common application areas for PR will be reviewed. For the interested reader, some articles that investigate PR in application areas that are intriguing but not quite connected to the focus of this thesis are: [21] (FPGA debugging), [22] (Artificial Neural Networks), [23]

(Security issues when PR is controlled remotely) and [24] (fault tolerance for space applications).

2.3.2 Hardware Accelerators

One common application area for PR is custom, time-multiplexed hardware accelerators for SoCs. The idea is quite alluring: to be able to access a hardware accelerator customized for many different types of processes while maintaining a small circuit footprint. In this application area however, the reconfiguration time versus possible speedup is essential when it comes to how beneficial the PR technique is. Quite intuitively, PR is more beneficial in cases where many operations on the same type of accelerator is performed in a row than when context switching is frequent. [25] is one example of a report where those kind of trade- offs are investigated for different use cases and architectures.

Since the use case studied in this thesis does not involve frequent context switching, the timing constraints differ from when PR is used to implement hardware accelerators. The reconfiguration time is certainly of interest in order to minimize system downtime, but context switching will only occur as often as in the order of seconds due to the line rate negotiation protocol of CPRI links.^∗ However, hardware accelerators are frequently used in the closely related field of Software Defined Radio (SDR), see Section 2.3.3 below.

2.3.3 Software Defined Radio

SDR is a concept where certain radio functionality such as for example modulation, demodulation, encoding and decoding is handled by software rather than dedicated hardware. It has become increasingly popular due to its flexibility, the increased number of diverse radio protocols and the reduced size but increased computational power of CPUs. Since partially reconfigurable FPGAs offer the opportunity to load custom hardware accelerators depending on the current protocol while maintaining a compact design, it is a popular implementation technique for SDR.

In 2008, E.J. McDonald wrote an article [26] which has been cited by several later works in the area. Aside from describing aspects of the PR design method

∗ On startup, a line rate negotation takes place between RE and REC where a new line rate is tested if synchronization has not been reached within a pre-determined time limit. The exact time limit is allowed to vary between 0.9-1.1 seconds depending on the implementation [3].

(35)

2.3. RELATED WORKS AND APPLICATION AREAS 17

in general, it more specifically discusses the feasibility and benefits of using PR within SDR. An example is given of a simplex transceiver architecture where the Forward Error Correction (FEC) block is reconfigured during runtime. This work showed that even though the vendor tools for PR implementation were still under development, the technology had matured to a level where it was feasible to include the PR design method in industrial products and not just in academic experiments. Another article from around the same time discussing similar topics is [16] by Delahaye et al., published in 2007.

In the last few years, there have been several case studies implementing different functional blocks and hardware accelerators for SDR by using PR. A few examples are:

• A.M. Lalge et alias’ conference paper from 2015 discussing Phase Shift Keying (PSK) Modems [27].

• A. Hassan et alias’ IEEE conference paper from 2015 reviewing the performance of different techniques used for programming the PR region applied to the use case of a Convolutuional Encoder [28].

• A.K. Nahar et alias’ journal article from 2017 discussing Multi Carrier Code Division Multiple Access (MC-CDMA)[29].

One researcher who has published many case studies implementing different SDR function blocks using PR is Arun Kumar from the Centre for Development of Advanced Computing in Trivandrum, India. These include [30] about variable Quadrature Amplitude Modulation (QAM) modes, [31] about Digital Pre-Distortion (DPD), [32] where an OFDM transmitter is implemented with the use of a partially reconfigurable IFFT module, and lastly [33] which is another study of the implementation of PSK modems using PR added to the one in [27].

To the author’s knowledge, there has been no work published about CPRI implementation using PR. Since CPRI is intended as more of a communication between the RE (possibly utilizing SDR) and the REC, it does not necessarily fall under the SDR function block category. However, it does operate in close conjunction with the SDR blocks and shares applicable demands such as timing and space occupation constraints. Hence, the above mentioned works are highly relevant to this thesis, which in turn is relevant to the design of both software defined and hardware based radio systems.

2.3.4 Cognitive Radio

One field that can be seen as an evolution of SDR is Cognitive Radio (CR).

The idea is that the radio system changes its parameters and operating modes

(36)

depending on sensed dynamic network and user conditions such as Bit Error Rate (BER), Signal-to-Noise Ratio (SNR) and channel occupancy. The benefits of implementing PR in CR applications are similar to those for SDR, but the reconfiguration time overhead becomes more crucial in the case of CR due to the increasingly dynamic nature of the system. Articles exploring the CR concept can be found in [17], [34] and [35]. Furthermore, many of the studies regarding PR in SDR are relevant to CR as well. For example, [36] regarding OFDM in CR is closely related to [30], which has already been cited in Section 2.3.3.

2.3.5 Dynamic CPRI line rate

In addition to the CoE approach mentioned in Section 2.1.4 and in the same spirit as CR, a concept that has been explored is to dynamically alter the CPRI line rate of a connection depending on the current network demand. Currently, the CPRI standard in [3] defines a constant line bit rate after the initial start-up negotiation.

This means that it has to be configured to be able to handle the worst-case scenario of network load and bandwidth. To reduce power consumption and operational cost, the possibility of reconfigurable CPRI line rate would be desirable. In [37]

and [38], this idea combined with CoE is explored. The line rate is configured by sending a RESET message which re-initiates the line rate negotiation process. If dynamic CPRI line rate would be implemented in future RANs, it would further increase the relevance and potential benefit of implementing CPRI (and more specifically the line rate configuration blocks) using PR.

2.3.6 Reconfigurable Ethernet Interface

The most closely related implementation case to this thesis that has been found is [39], which describes a reconfigurable Ethernet controller utilizing PR in an FPGA. It has the capability of switching between two different Ethernet protocols. The implementation and verification of functionality is thorough, but unfortunately no data is provided regarding savings in power consumption or area utilization. Furthermore, there is no mention about the reconfiguration time overhead compared to a fully static design.

2.4 A Note on Platform Technology

In the vast majority of the works analysed during the literature study which contained an actual implementation of PR in an FPGA, Xilinx was the vendor.

The reason for this can only be speculated on. It seems like Xilinx historically has offered the most tools and least restrictions when it comes to PR. This thesis

(37)

2.4. A NOTE ONPLATFORMTECHNOLOGY 19

does not aim to make a platform comparison or claim to be able to argue for one or the other, but it is clear that presently, Altera (recently acquired by Intel) also offers extensive PR capabilities and tools. There certainly exists a good amount of previously published articles with implementations on Altera FPGAs, for example[35], [39] and [40]. However, the value and relevance of this thesis is increased by the fact that published work for the specific platform used (Intel Arria 10) seems scarce, at least within this specific application area.

(38)

2.5 Logic Resources available for PR on Arria 10

Transceiver Channels

Hard IP Per Transceiver: Standard PCS, PCIe Gen3 PCS, Enhanced PCS

Transceiver Channels

Hard IP Per Transceiver: Standard PCS, PCIe Gen3 PCS, Enhanced PCS PLLs

PLLs Variable Precision DSP Blocks

M20K Internal Memory Blocks

Variable Precision DSP Blocks Variable Precision DSP Blocks M20K Internal Memory Blocks

M20K Internal Memory Blocks I/O PLLs

Hard Memory Controllers, General-Purpose I/O Cells, LVDS

I/O PLLs

Hard Memory Controllers, General-Purpose I/O Cells, LVDS Core Logic Fabric

Core Logic Fabric

PCI Express Gen 3 Hard IP PCI Express Gen 3 Hard IP

PCI Express Gen 3 Hard IP PCI Express Gen 3 Hard IP North

East

Figure 2.8: An overview horizontal slice of an Arria 10 FPGA, showing how the resource blocks are distributed. The orientation of the chip has been rotated 90 degrees in order to make it easier for the reader to view the text in the columns.

Only the core logic of the FPGA can be part of the PR region on Arria 10 devices.

This section outlines the different logic resources which can be used for Partial Reconfiguration.

(39)

2.5. LOGICRESOURCES AVAILABLE FOR PRONARRIA10 21

Adaptive LUT

Full Adder

Reg

Reg Full

Adder In 1

In 2 In 3 In 4 In 5 In 6 In 7 In 8

Figure 2.9: ALM architecture overview

2.5.1 Adaptive Logic Modules

Adaptive Logic Modules (ALMs) are the basic building blocks of Intel FPGAs.

As described in [41], they are small logic blocks that can be configured to perform a variety of combinational or sequential functions. The functionality of the whole FPGA is determined by the combined configuration of all of the ALMs it contains.

As a reference, the Intel Arria 10 GX 1150 used for this thesis contains 427,200 ALMs [42].

Figure 2.9 shows an overview of the architecture of an ALM. The adaptive Look-Up Table (ALUT) module, the adders, the multiplexers (MUXes) and the registers are configured according to the compiled design files, which are optimised by the Quartus compiler to fit the ALM structure.

2.5.2 Logic Array Blocks

A Logic Array Block (LAB) consists of ten ALMs, organized in a column. It provides a local interconnect between its ALMs which enables fast communication for implementing functions that require more than one ALM. This is further improved by the fact that the ALMs within a LAB share carry chains and arithmetic chains between their adders and LUTs, respectively.

Up to a quarter of the available LABs on an Arria 10 FPGA can be configured as Memory LABs (MLABs), acting as a dual-port SRAM with a maximum size of 640 bits. This is done by configuring each ALM as a 32x2 LUT-based memory.

(40)

2.5.3 Embedded Memory Blocks

Aside from the MLABs mentioned above, there is a type of dedicated 20 Kb memory blocks available in the Arria 10 FPGAs called M20Ks. These are suitable for large memory arrays while the MLABs are more optimal for wide and shallow arrays used in, for example, shift registers for DSP applications and filter delay lines.[41]

2.5.4 DSP Blocks

The Arria 10 devices include dedicated DSP blocks with reconfigurable logic supporting configurations that are optimized for certain arithmetic operations.

Fixed as well as floating point is supported, and the supported arithmetic operations include, for example, real and complex multiplication, systolic FIR filters and vector operations. It should be noted that not all operations are supported for both fixed and floating point arithmetic.

(41)

Chapter 3 PR Design Work Flow on Intel Arria 10

3.1 Special Considerations for PR Designs

Implementing a PR FPGA design alters the regular design flow and introduces some special considerations. This subsection outlines the most significant ones.

Many are technology-independent, but the statements here do not necessarily hold true for other platforms than Intel Arria 10 and other software versions than Quartus Pro 16.

3.1.1 Defining Personas

One of the most obvious differences when moving from a flat design^∗ to a PR design is that the logic block which is going to be replaced depending on the mode of operation needs to be identified and isolated. Then, different versions of that logic block need to be defined and designed. The different versions are referred to as personas in the Intel documentation [1], and the same term will be used henceforth in this thesis.

One important aspect when designing the different personas is that the ports of the HDL entity need to be identical regardless of persona. This is necessary in order to ensure compatability with the logic outside the PR region. The compiler takes care of mapping each port to the exact same location in the FPGA for each persona.

∗ A design where all different modes of operation is implemented in a single entity

23

(42)

24 CHAPTER3. PR DESIGNWORKFLOW ON INTEL ARRIA10

3.1.2 Floor planning

The region(s) which shall be reconfigured depending on mode of operation (the PR region(s)) must be specified. This imposes additional floor planning efforts compared to flat designs. See Section 3.2 for more details.

3.1.3 Storage of Personas

The files containing the configuration data of the PR region for each persona need to be stored somewhere. Depending on the file size, the personas can be stored either on the FPGA itself or in an off-chip memory. For demonstration and experimentation purposes, the bitstream can be transferred from the Quartus Programmer on a PC via a JTAG programming cable.

3.1.4 PR Control Logic

Dedicated logic is necessary in order to facilitate the reconfiguration of the PR region. The main functionality that needs to be achieved by this logic is:

• Reading the persona bitstream from memory

• Using the persona bitstream to reconfigure the PR region

• Freezing the ports of the PR region to known values during reconfiguration^∗

• Possibly, depending on whether it is necessary for the design at hand:

performing handshaking with the logic block inside the PR region before the PR process is initiated.

When implementing PR designs on the Intel Arria 10 FPGA, a PR IP block is available in the Quartus 16 IP Catalog. This IP block handles the reconfiguration of the PR region and provides a single-bit freeze output which can be used as an enable signal for the I/O freeze logic, which itself however needs to be implemented by the designer^†. Furthermore, custom control logic needs to be implemented in order to transfer the persona bitstream from memory to the PR IP block as well as sending the PR initiation signal to the PR IP block. If handshaking logic is necessary for robust functionality of the design at hand, that needs to be custom made as well^‡.

∗ It is not required to freeze the input ports of the PR region on Intel Arria 10 PR designs. ^† For AVMM interfaces, there are freeze bridge IP blocks in the Quartus 16 IP Catalogue ^‡ In Quartus 17, extended handshaking and control logic is available in the IP Catalogue.

(43)

3.2. WORKFLOW 25

3.1.5 Reconfiguration time

It is important to analyze the design’s sensitivity and tolerance to the time it takes to re-configure the PR region, as well as how often reconfiguration will take place.

This will have a great impact on the suitability of utilizing PR in the design. For example, it might be the case that certain deadlines in the system cannot be met if the logic inside the PR region is unavailable for too long. In that case, PR might have to be discarded altogether. In other cases where critical requirements can still be met, the usefulness of utilizing PR versus a flat design can still largely depend on the reconfiguration time of the PR region. One example of such a case is the application of reconfigurable hardware accelerators discussed in Section 2.3.2.

3.1.6 Compilation

Using the Quartus Shell from a terminal window, a PR compilation TCL^∗can be generated that will handle the whole compilation process.

Even though the static region (the area outside the PR region(s)) only needs to be compiled once, synthesis and placement of each persona in the PR region needs to be performed. This increases the compile time significantly compared to compiling a flat design (see Section 6.3). If only a certain persona (or only the static region) needs to be re-compiled, this can be done by passing certain arguments when executing the TCL script.

3.2 Work Flow

This section summarizes the PR design work flow for Intel Arria 10 in Quartus Pro 16, as described in [43]. An example design and walk-through can be found in [44].

3.2.1 Planning the Design

When planning the design, the first thing to do is to identify logical hierarchical boundaries which can be defined as reconfigurable partitions. The design hierarchy and source code should be set up to support this partitioning. Keep in mind that only core logic can be used in the reconfigurable partition(s), and not periphery resources such as I/O blocks and transceivers.

∗ Tool Command Language

(44)

3.2.2 Creating PR Partition(s)

A separate design partition must be created and set as reconfigurable for each PR region that is to be included in the design. A design partition in itself does not specify a physical area on the FPGA, but is merely a logical partitioning of the design. However, LogicLock Plus region assignments shall be used in order to specify the placement of the PR Region (see Section 3.2.3).

3.2.3 Floorplanning

In order to ensure that the location of the PR region and its associated ports remain the same between all different personas, the location of the PR partition must be specified via a LogicLock placement region assignment. The proportions of the assigned region affect the persona bit stream file size as well as partial reconfiguration time; increased height will yield increased reconfiguration time and file size even if the total region size is kept constant. This is due to the way that that the logic resources are addressed by the FPGA configuration files, namely that the smallest addressable configuration segment is aligned along the rows and not the columns of the FPGA. A routing region which is at least one unit larger than the placement region must also be specified. As previously mentioned, only core logic can be included in the PR region. This must be considered during floorplanning.

3.2.4 Instantiating PR IP Core

The PR IP Core performs the reconfiguration of the PR region, and can be instantiated in the base revision via the IP Catalogue or in QSys. It can be configured to have either an AVMM, JTAG or conduit^∗ interface. It must be controlled via either an external host, for example an off-chip processor, or an internal host, which can be for example be custom logic designed to integrate well with the rest of the design. The host must also facilitate the transfer of the persona bitstreams from their memory storage to the PR IP Core.

3.2.5 Defining Personas

As previously described, a persona is one of the possible logic configurations of a PR region. They should all be described by different HDL code, but it is important that all the personas use the exact same set of ports to connect to the static region outside the PR region. The physical placement of the ports must be the same as well, but the compiler takes care of this^† as long as the port declarations of the

∗ Custom individual signal interface ^† It is possible to specify location assignments for the I/O ports as well

(45)

3.2. WORKFLOW 27

HDL code describing each persona of a PR region are identical.

The different personas are later associated to different project revisions, see Section 3.2.6. As a starting point, a Base Revision describing the full FPGA design and instantiating the most complex persona should be developed. The base revision will define the static region. It must include freeze logic which uses a control signal (one is available from the PR IP Core) to freeze the output ports of the PR region during partial reconfiguration. This is necessary due to the fact that the values of the output ports are unknown during the partial reconfiguration process. It is not required to implement freeze logic for the input ports of the PR region(s) in Arria 10 PR designs.

3.2.6 Creating Revisions for Personas

In PR designs, there are three types of project revisions which must be created.

The base revision is used to compile the static region, and is the revision on which all the other revisions are based. It describes the full FPGA design and should instantiate the most complex persona in order to maximize the possibility of discovering timing and/or fitting errors early.

One synthesis only revision must be created for each persona of a PR region. It is solely used to synthesize the logic within the PR region for a certain persona. Its top component shall be specified as the top component of the corresponding persona, as opposed to the base revision whose top component is the top component of the whole FPGA design.

One implementation revision must also be created for each persona of a PR region. The only difference from the base revision is the revision type assignment.

An implementation revision is later associated to a specific synthesis revision in the partial reconfiguration compilation setup script, see below.

3.2.7 Compiling Design

Using the Quartus shell in the command line, a partial reconfiguration compilation Tool Command Language (TCL) script template can be generated and then adapted to the current design. All the revisions must be referenced, and the synthesis revisions must be associated to their respective implementation revisions. This can be done by editing the example setup script that is generated along with the compilation script. The compilation script can then be run from Quartus in order to perform the necessary compilation steps.

(46)

3.2.8 Programming FPGA and memory

Running the compilation script generates files that can be used to program the FPGA and perform partial reconfiguration. Depending on the implementation, different files can be used. As an example, in the project for this thesis, RBFs (Raw Binary Files) generated by the compilation script and corresponding to each persona were converted into flash format for programming into an external flash memory on the same board as the FPGA. The FPGA was then programmed with a SOF (SRAM Object File) corresponding to either one of the implementation revisions.

(47)

Chapter 4 CPRI L2 Block

4.1 Function in top CPRI block

Time Division Multiplex Optical Transmission Electrical

Transmission IQ

Data

VendorSPecific Ethernet HDLC L1 InbandProtocol

User Plane

Control &

Management Plane

Sync

Layer 2

Layer 1

Figure 4.1: Overview of how the data flows through different streams in the different layers of the CPRI block.

As can be seen in Figure 4.1, Layer 2 of the CPRI block is the interface between the higher layers of operation and the physical layer. Data flows in different formats and sub channels (IQ data, vendor specific, Ethernet etc.) from the three main channels or, in other words, logical connections - User Plane, Control &

Management Plane and Sync. Layer 2 is responsible for arranging the different 29

(48)

30 CHAPTER 4. CPRI L2 BLOCK

data flows into the correct frame format, outlined in Section 2.1.3. Layer 1 then handles the time multiplexing of the correctly arranged data.

As defined in [3] and outlined in Section 2.1.2, Layer 2 is the data link layer which shall handle media access control and flow control of the different data streams. It also handles data protection of the control and management information flow.

4.2 Architecture

SW Control Register

Sync CPRI TX

CPRI Control

CPRI RX Reset Ctrl

Gearbox Sync

HDLC Config

Ethernet

Vendor

IQ

L1 inband

L1 Status

L1 Service Data

Interface Bridges

Figure 4.2: Block diagram of the CPRI L2 component instantiated in the PR region in this project. Due to confidentiality, the block diagram has been simplified and some blocks have been anonymized.

Figure 4.2 shows an overview of the architecture of the CPRI L2 block instantiated in the PR Region in this project. In the transmitting direction, the higher-level data flows of the interfaces to the left are encoded and packed into the correct frame structure in order to be presented correctly to the L1 interfaces to the right.

This process is done in reverse in the receiving direction; the signals from the L1 interfaces to the right are decoded, unpacked and sent out on the correct service interfaces to the left.

The following sections describe the function of the different sub blocks.

(49)

4.2. ARCHITECTURE 31

4.2.1 SW Control Register

This sub block is used for software configuration of the other sub blocks, and also provides status messages. Furthermore, it handles the IRQ generation from alarm signals of different blocks.

4.2.2 Sync

The Sync block handles the synchronization of the TX and RX signals, mainly by generating or receiving strobe signals (depending on if the CPRI block is in Master, Slave or Partner mode) and delaying these depending on the line rate.

4.2.3 Service Interface Bridges

The function of the Service Interface Bridges in the TX direction is to perform the necessary operations to arrange the data coming in from the service interfaces (HDLC, Ethernet, Vendor Specific, IQ and L1 inband) into the right format before it is forwarded to the CPRI TX block. The reverse operation is done in the RX direction; the respective signals from the RX block are received and re-arranged into the right format for the service interfaces.

There are some different reasons why these service interface bridges are necessary. One example is that the IQ Service Interface Bridge folds/unfolds (depending on the direction) the IQ data in order to densify it and free up more space in the CPRI frame for other signals. This data format is not suitable for higher layers since they rely on deterministic data timing and frame structure.

Another example reason why the service interface bridges are necessary is that certain data is scrambled in order to maintain a regular ratio of ones and zero on the line, which is beneficial since the clock on the Slave side is derived from the signal itself.

4.2.4 CPRI Control

The CPRI control block contains control functions that are necessary to meet some of the requirements of the CPRI Specification, mainly when it comes to timing and synchronization. For example, this is where CPRI line delay calculation and compensation is done. The block also filters the control word extracted from the CPRI frame. One example of this filtering is that hysteresis is applied to some control signals, in order to ensure that single bit errors do not cause unstable operation.

The timing functionality differs depending on the operating mode. For example, the transmission timing is driven by the received link timing and

(50)

32 CHAPTER 4. CPRI L2 BLOCK

received link delay in Slave mode. However, for Master and Partner mode, the transmit timing is driven by a signal that is received from the Sync block mentioned above.

Furthermore, the CPRI Control block supervises the state of Layer 1 to detect critical status changes such as Loss of Signal and Line Synchronization status.

The status of Layer 1 partly determines which state Layer 2 should be in, and the CPRI Control block takes care of the supervision and signalling required to ensure proper operation in this aspect. The block also handles the L1 Inband Signaling that can be used to communicate directly with Layer 1 of the unit on the other side of the CPRI link.

4.2.5 CPRI RX

For each basic frame received from Layer 1, the CPRI RX block separates the control word from the IQ data. Furthermore, the block extracts data from the control word and forwards the data to the correct service interface bridge. The data size and conditions for which data to forward to each channel depend on the line rate and operating mode.

4.2.6 CPRI TX

The main functionality of the CPRI TX block is to create basic frames to transmit.

Similarly to the process in the CPRI RX block (but reversed), the block assembles a complete control word and then combines it in the correct order with the IQ data.

Also here, the data size and conditions for which data to read from each channel depend on the line rate and operating mode.

Furthermore, this block contains some special functionality to ensure that the data stays aligned for the 10.1 Gbit/s line rate option even though a different encoding scheme is used there^∗.

4.2.7 Gearbox

The Gearbox is used to convert between different bus widths in the L1 and L2 block. However, the functionality of the block is more complex for the 10.1 Gbit/s line rate option. This is due to the fact that bits in certain cases need to be stored before sending out in next cycle due to the higher data rate and encoding scheme.

∗ 64b66b line coding instead of 8b10b line coding

(51)

4.3. CREATION OF PR PERSONAS 33

4.2.8 Reset Ctrl

This block handles the reset functionality for the whole CPRI L2 block and all its sub blocks.

4.3 Creation of PR Personas

In the CPRI L2 block, line rate and operating mode depend on values in the SW Control Register. In order to reduce the logic utilization of the block and create personas for the PR region, different values of line rate and operating mode were hard coded in the HDL code. As described in the previous sections, the functionality of many of the sub blocks depend on one or both of these settings.

Thus, the hard coded settings made the compiler remove logic that was not needed for the configuration of each persona compared to the flat design. Creating a unique persona for each of the possible combinations of settings as well as an empty persona yielded a total of thirteen different personas.

Table 4.1: Overview of the PR personas and their settings.

Line Rate

Mode Master Slave Partner

2.5 Gbit/s m 2p5 s 2p5 p 2p5

4.9 Gbit/s m 4p9 s 4p9 p 2p5

9.8 Gbit/s m 9p8 s 9p8 p 9p8

10.1 Gbit/s m 10p1 s 10p1 p 10p1

(52)

(53)

Chapter 5 Implemented PR Design

In order to evaluate the aspects of interest, a PR system was implemented in an Intel Arria 10 development board. This provided an opportunity to go through the design flow from the bottom up, giving insight into special considerations and obstacles that arise when working with PR as opposed to flat FPGA designs. The implemented PR system then acted as a test and evaluation platform.

5.1 Development Board

The development board used for this thesis was the Intel Arria 10 GX Transceiver Signal Integrity Development Kit [45], featuring an Arria 10 GX1150 FPGA (10AX115F1932C) as well as a variety of off-chip components and accompanying evaluation software. The components mainly used in the implementation at hand are the flash memory (see Section 5.2.4) and the Embedded USB-Blaster which is used for computer-to-FPGA communication as well as full reconfiguration of the whole FPGA. Furthermore, a few on-board LEDs were used in the initial, basic PR design.

35

(54)

36 CHAPTER5. IMPLEMENTEDPR DESIGN

5.2 Architectural Design

5.2.1 Overview

FPGA External flash

memory

Flash memory interface

DMA/PR Controller

PR Region PR IP Core

Figure 5.1: Overview of the system implemented during this thesis project.

The PR system designed and implemented during the work on this thesis can be seen in Figure 5.1. It is designed for evaluation of and experimentation with PR designs on Intel Arria 10 FPGAs. The custom-designed DMA/PR controller block is the central control unit of the system, and can in turn be controlled via System Console^∗ on an external computer. During partial reconfiguration of the PR region, the DMA/PR controller block reads data from the flash memory on the development board into an internal FIFO. It then forwards the data and handles the control of and communication with the PR IP block, which handles the actual reconfiguration of the PR region.

The PR system was designed using a combination of Qsys Pro^†and pure HDL coding. The different on-chip logical blocks all communicate via Intel’s Avalon Memory-Mapped (AVMM) interface standard.

∗ A built-in tool in Quartus Pro which can be used to communicate directly with circuits on the FPGA via a JTAG bridge. ^† A built-in hardware system design tool in Quartus Pro.

Partial Reconfiguration of a CPRI Implementation on an FPGA

Partial Reconfiguration of a CPRI Implementation on an FPGA

ALFRED SAMUELSON

Abstract

Sammanfattning

Acknowledgements

Contents

List of Figures

List of Tables

List of Abbreviations

Chapter 1 Introduction

1.1 Problem

1.2 Purpose

1.3 Relevance

1.4 Structure of the Thesis

Chapter 2 Background

2.1 Common Public Radio Interface

2.1.1 Overview

2.1.2 System Description

. . . . . .

. . .

. . .

. . .

2.1.3 CPRI Frame Structure

2.1.4 CPRI over Ethernet

2.2 Partial Reconfiguration of FPGAs

2.2.1 Overview

2.2.2 History

2.3 Related works and application areas

2.3.1 PR in General

2.3.2 Hardware Accelerators

2.3.3 Software Defined Radio

2.3.4 Cognitive Radio

2.3.5 Dynamic CPRI line rate

2.3.6 Reconfigurable Ethernet Interface

2.4 A Note on Platform Technology

2.5 Logic Resources available for PR on Arria 10

2.5.1 Adaptive Logic Modules

2.5.2 Logic Array Blocks

2.5.3 Embedded Memory Blocks

2.5.4 DSP Blocks

Chapter 3

PR Design Work Flow on Intel Arria 10

3.1 Special Considerations for PR Designs

3.1.1 Defining Personas

3.1.2 Floor planning

3.1.3 Storage of Personas

3.1.4 PR Control Logic

3.1.5 Reconfiguration time

3.1.6 Compilation

3.2 Work Flow

3.2.1 Planning the Design

3.2.2 Creating PR Partition(s)

3.2.3 Floorplanning

3.2.4 Instantiating PR IP Core

3.2.5 Defining Personas

3.2.6 Creating Revisions for Personas

3.2.7 Compiling Design

3.2.8 Programming FPGA and memory

Chapter 4

CPRI L2 Block

4.1 Function in top CPRI block

4.2 Architecture

4.2.1 SW Control Register

4.2.2 Sync

4.2.3 Service Interface Bridges

4.2.4 CPRI Control

4.2.5 CPRI RX

4.2.6 CPRI TX

4.2.7 Gearbox

4.2.8 Reset Ctrl

4.3 Creation of PR Personas

Chapter 5

Implemented PR Design

5.1 Development Board

5.2 Architectural Design

5.2.1 Overview