Interfacing an external Ethernet MAC/PHY to a MicroBlaze system on a Virtex-II FPGA

(1)

Interfacing an external Ethernet

MAC/PHY to a MicroBlaze system on

a Virtex-II FPGA

Master’s thesis

performed for ITEE, University of Queensland, Brisbane, Australia

by

Johan Bernsp˚ang Reg nr: LiTH-ISY-EX-3440-2004

(2)

(3)

Interfacing an external Ethernet

MAC/PHY to a MicroBlaze system on

a Virtex-II FPGA

Master’s thesis

performed in Computer Engineering, Dept. of Electrical Engineering

at Link¨opings universitet by Johan Bernsp˚ang Reg nr: LiTH-ISY-EX-3440-2004

Supervisor: Doctor John Williams ITEE, University of Queenland Professor Neil Bergmann

ITEE, University of Queenland Examiner: Professor Dake Liu

Link¨opings Universitet Link¨oping, 31th May 2004

(4)

(5)

Avdelning, Institution Division, Department Datum Date Spr˚ak Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats ¨ Ovrig rapport

URL f¨or elektronisk version

ISBN

ISRN

Serietitel och serienummer Title of series, numbering

ISSN Titel Title F¨orfattare Author Sammanfattning Abstract Nyckelord Keywords

Due to the development towards more dense programmable devices (FPGAs) it is today possible to fit a complete embedded system includ-ing microprocessor, bus architecture, memory, and custom peripherals onto one single reprogrammable chip, it is called a System-on-Chip (SoC). The custom peripherals can be of literally any nature from I/O interfaces to Ethernet Media Access Controllers. The latter core, how-ever, usually consumes a big part of a good sized FPGA. The purpose of this thesis is to explore the possibilities of interfacing an FPGA based Microblaze system to an off-chip Ethernet MAC/PHY. A solu-tion which would consume a smaller part of the targeted FPGA, and thus giving room for other on-chip peripherals or enable the use of a smaller sized FPGA. To employ a smaller FPGA is desirable since it would reduce power consumption and device price. This work includes evaluation of different available Ethernet devices, decision of interface technology, implementation of the interface, testing and verification. Since the ISA interface still is a common interface to Ethernet MAC devices a bus bridge is implemented linking the internal On-Chip Pe-ripheral Bus (OPB) with the ISA bus. Due to delivery delays of the selected Ethernet chip a small on-chip ISA peripheral was constructed to provide a tool for the testing and verification of the bus bridge. The main result of this work is an OPB to ISA bus bridge core. The bridge was determined to work according to specification, and with this report at hand the connection of the Ethernet chip to the system should be quite straightforward.

Computer Engineering, Dept. of Electrical Engineering

581 83 Link¨oping 31th May 2004

—

LITH-ISY-EX-3440-2004 —

http://www.ep.liu.se/exjobb/isy/2004/3440/

Interfacing an external Ethernet MAC/PHY to a MicroBlaze system on a Virtex-II FPGA

Utveckling av ett gr¨anssnitt mellan ett externt ethernetchip och ett Microblaze system p˚a en Virtex-II FPGA

Johan Bernsp˚ang

× ×

(6)

(7)

Abstract

Due to the development towards more dense programmable devices (FPGAs) it is today possible to fit a complete embedded system including microprocessor, bus architecture, memory, and custom peripherals onto one single reprogrammable chip, it is called a System-on-Chip (SoC). The custom peripherals can be of literally any nature from I/O interfaces to Ethernet Media Access Controllers. The latter core, however, usually consumes a big part of a good sized FPGA. The purpose of this thesis is to explore the possibilities of interfacing an FPGA based Microblaze system to an off-chip Ethernet MAC/PHY. A solution which would consume a smaller part of the targeted FPGA, and thus giving room for other on-chip peripherals or enable the use of a smaller sized FPGA. To employ a smaller FPGA is desirable since it would reduce power consumption and device price. This work includes evaluation of different available Ethernet devices, decision of interface technology, implementation of the interface, testing and verification. Since the ISA interface still is a common interface to Ethernet MAC devices a bus bridge is implemented linking the internal On-Chip Peripheral Bus (OPB) with the ISA bus. Due to delivery delays of the selected Ethernet chip a small on-chip ISA peripheral was constructed to provide a tool for the testing and verification of the bus bridge. The main result of this work is an OPB to ISA bus bridge core. The bridge was determined to work according to specification, and with this report at hand the connection of the Ethernet chip to the system should be quite straightforward. Keywords: OPB, ISA, Microblaze, FPGA, VHDL, Ethernet, ChipScope

(8)

List of Figures

2.1 Schematic of a CLB slice . . . 5

2.2 The Microblaze Architecture . . . 6

2.3 Standard 16-bit I/O device ISA bus cycle . . . 10

2.4 Basic OPB transaction . . . 13

2.5 Circuit for moving data across clock boundary . . . 15

3.1 OPB to ISA bridge architecture . . . 18

3.2 The OPB Finite state machine . . . 21

3.3 The ISA Finite state machine . . . 23

3.4 The Timeout Watchdog Finite state machine . . . 24

3.5 The ISA GPIO core . . . 26

4.1 OPB to ISA read operation . . . 29

4.2 OPB to ISA write operation . . . 30

4.3 OPB to ISA read operation with failure . . . 30

4.4 OPB to ISA write operation with failure . . . 31

D.1 ChipScope Inserter, design specification . . . 41

D.2 ChipScope Inserter, trigger parameters . . . 42

D.3 ChipScope Inserter, capture parameters . . . 42

D.4 ChipScope Inserter, net connections . . . 43

D.5 ChipScope Analyzer trigger setup . . . 45

D.6 ChipScope Analyzer waveform capture . . . 45

G.1 Simulation of unsuccessful read cycle . . . 83

G.2 Simulation of successful read cycle . . . 84

G.3 Simulation of unsuccessful write cycle . . . 85

G.4 Simulation of successful write cycle . . . 86

(11)

List of Tables

3.1 Bus bridge specification . . . 19 4.1 Device utilization . . . 32

(12)

List of Listings

E.1 Bus Wrapper . . . 46

E.2 OPB Interface . . . 55

E.3 ISA Interface . . . 61

E.4 Timeout Watchdog . . . 68

E.5 ISA Clock Generator . . . 71

E.6 OPB to ISA communication . . . 72

E.7 Clock Domain Crossing . . . 74

E.8 ISA GPIO . . . 76

(13)

Chapter 1

Introduction

1.1 Background

One of the research areas of School of Information Technology and Electrical Engineering (ITEE) at University of Queensland in Brisbane, Australia, is reconfigurable Systems-on-Chip (rSoC). Particularly for implementation of real-time embedded systems where custom hardware peripher-als can improve real-time response rates. The research is mainly based on a Field Programmable Gate Array (FPGA) with a MicroBlaze softcore processor from Xilinx running a version of em-bedded Linux, uCLinux. The port of uCLinux to Microblaze is made by a member of the research group, John Williams, who is also one of the supervisors of this project. The group has proposed a future platform for real-time reconfigurable system on chips, Egret, which would be based on a Microblaze softcore processor running uCLinux.[1]

One of the key capabilities of an embedded real-time system is to communicate efficiently with the surrounding environment. An Ethernet core can easily be added to the system as an on-chip peripheral on the utilized FPGA, however with the significant drawback that it covers a comparatively large part of the chip. Thus, taking room from other potential peripherals or real-time tasks implemented in hardware, or forcing the use of a larger and more expensive FPGA. This master’s thesis is an attempt to interface a standard Ethernet Media Access Controller/-Physical interface (MAC/PHY) chip with the FPGA in a general way, utilizing as little of the FPGA chip as possible. The reason to achieve small solution would be to make room for other time critical tasks to be implemented in hardware. A small solution would also enable the use of a smaller FPGA device which would be cheaper and consume less power. Price and power consumption are two negative aspects with employing FPGA chips instead of using the ASIC technology.

The first task of this project was a thorough investigation of previous attempts to interface an Ethernet MAC/PHY with a Microblaze system. No such solution was to be found though, in all previous solutions on-chip Ethernet MACs were used together with off-chip physical interfaces. The second task was to implement the interface between the Microblaze system and the Ethernet MAC/PHY of choice and to review the involved technologies. Finally, a careful testing and verification of the implementation need to be done ensure its functionality.

The thesis work is carried out due to the requirement of the master of engineering degree at the university of Link¨oping in Sweden. Examination is done by the professor in computer engineering, Dake Liu, at the department of electrical engineering.

(14)

2 Introduction

1.2 Objectives

The main objective of this thesis is to find a suitable Ethernet MAC/PHY for the existing system and create an interface between the chip and the Microblaze system. In order to reach this objective a number of goals were identified:

• Evaluation of different Ethernet MAC/PHY chips to find a suitable solution.

• Review and evaluation of the utilized FPGA technology and the Microblaze system, the development tools and the development of Microblaze peripherals. The review should cover the bus architecture utilized by Microblaze systems and the design of custom peripherals.

• Review of the interface technology between the chosen Ethernet chip and the Microblaze system. Design challenges should be identified and evaluated.

• Development of an interface between Ethernet chip and the system. Preferably this inter-face should be an easy to use IP core, which utilize as small part of the FPGA as possible. The IP should be implemented in a general way, which would render other areas of use besides the specific area described in this thesis.

1.3 Method

The first part of the work was to learn how to use the tools to create embedded systems on Xilinx devices. Tutorials on EDK and ISE were used for this purpose. Second, a suitable Ethernet MAC/PHY was chosen to identify what kind of interface that had to be constructed. Third, the different technologies involved in the interface, such as the On-Chip Peripheral Bus (OPB), the Industry Standard Architecture (ISA bus), the utilized FPGA family and the Microblaze softcore processor were investigated to give an understanding about how the custom core was to be implemented.

When the topics mentioned in the previous paragraph had been thoroughly studied in various publications, an OPB to ISA bus bridge was designed and implemented. To verify the design both simulation and on-chip verification were performed.

1.4 Limitations

Due to the schooling of the author this thesis is written in American English. Another aspect of the limitations is the availability of software and hardware. For an indepth explanation of the available resources for the implementation and testing of the OPB to ISA bus bridge, see appendix C. The software tools proved to be satisfying for this kind of development.

1.4.1 Time

This thesis work has been performed during a period of 20 weeks in accordance with the require-ments at the university of Link¨oping.

1.5 Thesis outline

Chapter 2, Technology & Background, will give the reader sufficient background information regarding the different technologies that will be utilized throughout the thesis.

(15)

Introduction 3

Chapter 3, Implementation & Testing, covers the different options of Ethernet chip as well as the design and implementation of the bus bridge and a very simple on-chip ISA peripheral. Furthermore, it describes the testing and verification methodologies.

Chapter 4, Results & Discussion, presents and discusses the outcomes from the work. Chapter 5, Conclusions, sums up the report and gives some advice for future work. The appendices contain a list of acronyms, acknowledgements, description of hardware and software resources, a short guide and tutorial to ChipScope, the source code for the VHDL implementations, details about device utilization, and simulation waveforms.

(16)

Chapter 2

Technology & Background

This chapter will cover the underlying theories for the technologies that are utilized throughout this thesis. The technologies include SoC in general, the employed FPGA family, Virtex-II from Xilinx, Ethernet, the OPB and ISA specifications, guidelines to design of custom OPB peripherals, and how to provide a secure data path across clock domain boundaries. For a more in-depth coverage, see the referred literature.

2.1 Systems-on-Chip, SoC

In recent years the embedded computer systems, utilized as vehicle control units, in wireless communication systems, etc., have become increasingly complex. At the same time the feature size of integrated circuits has decreased significantly making room for more transistors on one piece of silicon than ever before.1 _{The trend towards more dense silicon technologies has led}

to more sophisticated FPGA circuits that allow integration of numerous functions onto a single silicon chip, thus it can be utilized for a wider range of applications. Instead of using different chips for the different parts of the system, i.e. one chip for the processor core or micro controller, one for IO-interface, one for memory etc. in a SoC all of these parts are integrated on the same piece of silicon.

One recent development is towards reconfigurable System on Chip, even though the approach was proposed in the 1960s. That is, the system is able to reprogram itself during runtime, exchange one hardware function to another, or moving software processes to hardware ditto to meet timing issues, depending on the demand of the system. When designing reconfigurable SoC it is possible to use more hardware than available on the target chip, peripheral netlists are kept in memory and loaded onto the device when necessary.[3]

2.2 Field Programmable Gate Array, FPGA

Before the appearance of programmable logic, hardware designers either built custom logic cir-cuits at the board level using standard components, or at the gate level relying on expensive ASIC technologies. Today the designers can use the FPGA technology to verify the function-ality of a system during the design flow. An FPGA is a generic integrated circuit consisting of configurable logic blocks (CLBs) and programmable interconnections. A design is implemented by specifying a simple logic function for each cell, and closing the appropriate switches in the

1_{At the time of writing the smallest feature size in production is 90 nm.[2]}

(17)

Technology & Background 5

interconnection network. Modern FPGA circuits contain enough logic to implement SoCs and other complex designs.[4]

The term Field Programmable implies that the function of the FPGA is defined by a user de-scription rather than by the manufacturer of the circuit. The user employ a hardware dede-scription language (HDL), such as VHDL or Verilog, to describe the desired system. Synthesis software is then utilized to translate the description into a bit file that is used to program the FPGA.[4] The bit file can be stored in either one-time programmable memory cells, non-volatile memory (flash memory) or volatile memory (static and dynamic RAM). For reconfigurable system-on-chip the latter is of particular interest. A SRAM-based FPGA can be programmed during system initialization, and it can also be changed dynamically during system operation.[5]

FPGA circuits are well suited for prototyping and small production volumes. The use of FPGAs also enables future hardware updates. That is, if bugs in the implementation are dis-covered a new bit file can be downloaded to the device in very short time without disturbing the execution of the system too much. However, since an FPGA is not optimized for a specific application it may consume more power or be less efficient implementation of the design than an ASIC. Another drawback is that the price per chip is high.[4]

2.2.1 Virtex-II

Today, the FPGA market is dominated by three companies, Xilinx, Altera, and Actel[5]. Xilinx has developed a number of FPGA families. The Virtex-II family is developed for high per-formance from designs based on customized modules and IP cores. The family consists of 11 members, ranging from 40K to 8M system gates.[6] The Virtex-II platform is the result of the largest development attempt in the history of programmable logic, and it has many novel fea-tures which simplifies the design and implementation of complex systems. A Virtex-II FPGA has, for instance, up to 16 low-skew clock domains2_{, and on-chip controlled output impedances}

to eliminate external termination resistors.[7]

LUT F LUT G Register Register Arithmetic logic MUX MUX

Figure 2.1: A rough schematic of a CLB slice. Source: [6]

The Virtex-II FPGA consists of input/output blocks (IOBs), and internal logic blocks. The interface between the external pins and the internal configurable logic is provided by the IOBs. An IOB can be used in three different ways, as input block with either single- or double-data-rate register (DDR), as output block with single-double-data-rate register or DDR, or with an optional

(18)

6 Technology & Background

3-state buffer that can be driven directly or through a single or double register, or, finally, as a bidirectional block with any combination of the above configurations. The internal logic blocks has four major elements. Functional elements for combinatorial and synchronous logic is provided by CLBs. Large 18 kbit storage elements are provided by dual-port RAM modules. 18x18-bit dedicated multipliers are provided by multiplier blocks. Finally, the digital clock managers (DCM) provide a self-calibrating, fully digital delay compensated solutions for clock multiplication and division, and coarse- and fine-grained clock phase shifting.[6]

Every CLB consists of four slices which is the main component of the CLB. The concept of slices provides a good measure of device utilization during synthesis. Each slice is composed of two 4-input function generators, two storage elements, arithmetic and carry logic and multiplexers, see figure 2.1. The function generators are mainly used as lookup tables (LUT), but can also be used as a 16 bit distributed RAM or a 16 bit variable-tap shift register. A LUT is able to implement any arbitrary 4-input boolean function, and its output drives both the slice output and the input to the corresponding register in the slice. The multiplexers are included in the Virtex-II slices to enable logical functions with up to 8 inputs, i.e. a combination of the two LUTs. In a Virtex-II the storage elements can be configured either as edge-triggered D-flip-flops or as level-sensitive latches.[6]

2.2.2 The Microblaze core

To enable the development of more powerful embedded solutions with the Xilinx FPGAs the company has provided a soft core processor3_{, Microblaze}TM_{. The core is a high performance,}

32-bit, RISC processor that runs at 100 MHz on a Virtex-II device. Furthermore it consists of a three stage pipeline with separate instruction and data paths (Harvard-style). The core is illustrated in figure 2.2 below.[8]

Bus IF Bus IF Instruction-side bus interface Data-side bus interface Program Counter Instruction Buffer ILMB IOPB Instruction Decode Register File 32 x 32 bit Add/Sub Shift/Logical Multiply DLMB DOPB

Figure 2.2: The Microblaze core block diagram. Source: [9]

The Microblaze core comprises two bus interfaces, one for the instruction path and one for the data path. Each of the interfaces is divided into an interface for the On-chip Peripheral Bus (OPB), and an interface for the Local Memory Bus (LMB). The connection to the on- and off-chip peripherals and memory is provided by the OPB, while a single-cycle access to on-chip 3_{A soft core processor has a configurable architecture. It is possible to add custom design peripherals and/or} instructions prior to synthesis.

(19)

dual-port block RAM is provided by the LMB. A Microblaze platform must include one data path and one instruction path, therefor it will utilize two, three or four bus interfaces.[9]

The Microblaze architecture employ big-endian bit naming convention. That is, the most significant bit (MSB) in a vector is positioned in bit zero. When designing custom cores, or using pre-designed peripherals, care has to be taken when connecting vectors with different endian conventions.[9]

2.2.3 Embedded Development Toolkit, EDK

To facilitate the development of embedded systems based on FPGAs from Xilinx the company has provided the Embedded Development Kit, EDK. EDK comprises the whole tool chain that is needed to transform a specification into a running embedded system, as well as the Microblaze core itself and a large set of other peripherals. Among the supplied peripherals are UARTs, memory controllers, OPB to PCI bus bridge, and a Ethernet MAC.[10]

The system architecture is specified in a Microprocessor Hardware Specification, MHS, file. In the file each core of the system is configured and given an address space. The processor and bus architecture is specified together with all the peripherals. Local and global ports are also defined. The MHS file is the input to the platform generator tool, which constructs an embedded system in the form of hardware netlists. In the corresponding Microprocessor Software Specification, MSS, file the type of software driver for every core of the system is specified. The MSS file is input to the library generator tool which is employed to configure the peripheral device drivers and libraries.[10] The system netlist is input to a chain of tools that carries out mapping of the system, placement of the logic functions onto CLBs, routing of the interconnects, and finally generation of the bit file that is downloaded to the FPGA[11].

2.3 Ethernet MAC/PHY

In order to be able to communicate with a surrounding network, or with the internet, an embed-ded system need some sort of network interface. The most widespread network protocol used today is the IEEE 802.3 standard, a.k.a Ethernet. An Ethernet adapter consists of two parts: the Media Access Controller, MAC, which controls the transactions, and the Physical layer, PHY, which is the physical connection to the surrounding network.[12]

2.3.1 Media Access Controller, MAC

The Media Access Controller determines whether the device has access or not. The MAC mech-anism is based on a system called Carrier Sense Multiple Access with Collision Detection (CS-MA/CD). Carrier sense means that all devices need to listen for a period of quiet before at-tempting to send. Multiple access imply that when it has been quiet long enough all devices has equal chance of sending. Finally, due to collision detection, if two devices start to send at the same time the collision is detected and both quit the attempts.[12]

The Ethernet protocol sends data, and additional information, in packets. Prior to the packet a seven byte preamble field consisting of alternating ones and zeros is used for synchronization. Following the preamble is a start of frame delimiter of one byte. The first 14 bytes of the packet is the header which includes destination address, source address and data type. Following the header is 46 to 1500 bytes of data. If the data in a packet is less than 46 bytes it has to be padded with unused bytes. After the data section is finally an error detection section. On the medium the packets are transmitted serially to four bits in parallel over the shared channel to every connected device. The transmission method, i.e. the number of bits in parallel, is depending on

(20)

the physical layer. After each packet transmission all devices have equal chances of doing the next transmission.[12, 13]

2.3.2 Physical Layer Device, PHY

Several different types of medium are available for Ethernet networks; coaxial- and twisted pair cables and optical fibers. There are three common physical layers utilized to connect the Ethernet MAC to the medium; Attachment Unit Interface (AUI), Medium Independent Interface (MII), and Reduced MII (RMII). The AUI is an old interface for 10 MBit/s connection only, while MII and RMII were introduced with 10/100 MBit/s Ethernet standard. MII and RMII offer some extent of parallelism, MII transport data with four parallel bits and RMII with two parallel bits. The RMII was developed to reduce the pin count, but instead it utilizes a higher clock frequency. The physical layer of an Ethernet device conform to certain standards, the device itself does not know which type of PHY it uses.[14]

2.3.3 The OPB Ethernet MAC

The Ethernet MAC core from Xilinx is designed for 10/100 Mbit/s communication through a MII physical interface provided by the development board used for this project or another PHY. On-chip the MAC is connected to the OPB, with 32 bit data width, for communication with the rest of the system. To achieve 10 Mbit/s Ethernet communication the system clock must be at least 6.5 MHz. Thus, by dividing the minimum clock frequency with the number of OPB data transitions per second gives the number of employed OPB clock cycles per transition, see calculation below.[13]

6500000

10000000/32 = 21

That is, for an average Ethernet MAC cycle 21 OPB clock cycles are used.

The MAC is equipped with a number of design parameters to provides flexibility for the system designer. There are two major drawbacks with the Xilinx Ethernet MAC though. First of all, the core is not free of charge, in order to use it for more than a few hours at a time a license fee has to be paid. Second, on a Virtex-II FPGA the core consumes at least 1555 slices which is quite a lot if device size or power consumption is an issue.[13]

2.4 Bus architectures

Two different bus architectures are involved in the design in this thesis: the older ISA bus, and the newer OPB.

2.4.1 The ISA bus specification

Originally, in the IBM PC, the Industry Standard Architecture bus was only 8 bits wide, but with the evolution towards the IBM PC/AT it was expanded to 16-bit data width with some additional functionality. At a later stage the ISA bus was extended to 32 bits with the advent of Extended Industry Standard Architecture (EISA). Still today, the ISA bus is one of the most common interfaces among Ethernet devices. There are a number of different ISA bus cycles, both 8-bit and 16-bit data width, communicating with processors or direct memory access (DMA) devices.[15] This section will only deal with the 16-bit version of the ISA bus communicating with a microprocessor.

(21)

The ISA bus can be divided into three different parts; the address bus, the data path, and the control- and timing signals. A hash sign (#) after a signal name indicates that the signal is active low. Signals used during memory operation, such as DMA, are omitted since the implementation in this thesis only work in I/O mode.

• Address bus: The system address bus, SA, consists of the lower 20 bits of the micro-processor address. During address time, Ts, the address is latched onto the address bus and is visible to all ISA peripherals. If address pipelining is utilized by the peripheral, bit 17 to 23 of the microprocessor address bus can be presented prior to Ts. These bits are called the latchable address bus, LA4_{. Some processors and peripherals also make use}

of the system bus high enable, SBHE#, to indicate that the upper half of the data bus, SD(15:8), will be transferring a byte to an odd address.[15] The Ethernet chip targeted in this implementation uses this signal completely different however, see section 3.2.2[16].

• Data path: The system data bus, SD, is utilized to transfer data during the data time, Tc, of a bus cycle. The data bus is bidirectional, i.e. it can be used for both read and write operations. If a peripheral is 8-bit, only the lower path5 _{is used. For 16-bit transfers}

the lower path is used to transfer data to even-addressed locations, and the upper path is used to transfer data to odd-addressed locations.[15]

• Control- and timing signals: Depending on the nature of the ISA cycle, hence if it is an I/O read or an I/O write operation, the appropriate control signal, IOR# respectively IOW#, is asserted during Tc.[15]

The ISA bus clock BCLK is derived from the system clock. Usually the ISA clock has a frequency between 8 and 8.33 MHz. Some peripherals, however, allow frequencies up to 11 MHz6 _{or more.[15]}

The buffered address latch enable, BALE, signal is asserted during the second half of Ts. It indicates that the address bus is now valid and latched onto SA. When the microprocessor is not the bus master, i.e. when DMA is utilized for instance, the BALE signal is constantly asserted. For peripherals that are not fast enough to respond to a standard ISA bus cycle the channel ready signal, CHRDY, can be utilized. The peripheral deasserts the signals and, thus, adds clock cycles to the bus cycle until it asserts CHRDY again. To indicate that the peripheral can support one wait-state I/O bus cycle it asserts the I/O size 16 signal, IO16#.[15]

During power-on the reset signal, RESDRV, is asserted in order to force the peripherals into a known state. The RESDRV also prevents the peripherals from doing anything until the power is stabilized or until the microprocessor is ready to receive data or interrupts from a peripheral. Originally the ISA bus structure had four interrupt lines that were not assigned to devices, IRQ9 to IRQ11 and IRQ157_.[15]

2.4.2 Standard 16-bit I/O device ISA bus cycles

There are a number of different ISA bus cycle types depending on the nature of the implemen-tation. The data width may be either 8 or 16 bit, the peripheral may be either a memory device

4_{This bus is always zero during I/O operations.}

5_{The lower path constitutes of bits 0 to 7 on the ISA data bus.}

6_{The Ethernet chip used in this thesis accepts clock frequencies between 8 and 11 MHz.[16]}

7_{These names are used in the implementation described in this thesis. The only limitation on the number of} interrupt lines, however, is purely in the system interrupt handler. See section 3.2.3.

(22)

or an I/O device. This section describes the standard 16-bit I/O device bus cycle which consists of one address clock cycle and two data clock cycles. Each step corresponds to a reference point in figure 2.3.[15] BCLK Tc2 Ts Tc1 Tc2 Address bus, SA BALE IOR#, IOW#

i

iii

ii

iv

Read data, SD Write data, SD

vii

CHRDY IO16#

v

vi

Figure 2.3: This figure describes a standard access to a 16-bit I/O ISA device. The figure is a modified version of figure 17-3 in [15].

i. When BALE is asserted halfway through Ts, the address is latched onto the SA bus. The address remains on the SA bus for the remainder of the bus cycle, and until the next ISA address is latched onto the bus.

ii. If the current cycle is a write, the output data is latched onto the SD bus simultaneously with the address. The valid data is available on the data bus, SD, until halfway through Ts in the next ISA bus cycle.

iii. The first data cycle, Tc1, starts with the falling edge of BALE.

iv. At midpoint of Tc1, the appropriate read- or write signal is asserted. The command line remains asserted until the end of the last data cycle, Tc2, during a normal I/O bus cycle. v. At the second data cycle, CHRDY is sampled. If the peripheral cannot complete the transaction by the end of this clock cycle, it should deassert CHRDY. Additional data time is given until CHRDY is asserted to indicate that the bus cycle can be completed.

(23)

vi. To determine if the I/O device is 8 or 16-bit wide the IO16# is sampled at the midpoint of Tc2. If asserted the bus cycle will be terminated at the end of Tc2 without any data steering. The bus cycle will not be terminated if CHRDY is not asserted.

vii. At the end of the last data cycle, the appropriate read- or write signal is deasserted. In case of a read operation, data is read from the SD bus at the rising edge of BCLK that terminates the bus cycle.

2.4.3 The OPB bus specification

IBM has developed a family of three different busses for interconnecting cores and custom logic in system-on-chips, the CoreConnectTM_{bus architecture. The family consists of the Processor}

Local Bus (PLB), On-Chip Peripheral Bus (OPB), and Device Control Register bus (DCR). The PLB is utilized for interconnection between high-bandwidth devices, mainly in Virtex-II PRO devices. For instance processor cores, external memory interfaces, and DMA controllers. The DCR is intended to use for reading of status and configuration registers of lower performance.[17] However, the current system employs only the OPB from the CoreConnect family, for communi-cation between the processor core and on-chip memory the Microblaze uses Xilinx Local Memory Bus[9].

The Microblaze utilize one or more instances of the OPB to communicate with on-chip pe-ripherals. The architecture provides a common, easy-to-use, interface for various pepe-ripherals. The bus allows an arbitrary number of bus masters to read from and write to an arbitrary num-ber of slaves. The Xilinx implementation, however, supports up to 16 masters together with an unlimited number of slaves (depending on hardware resources). When several OPB masters share a bus, an OPB Arbiter8 _{is used to grant exclusive bus access. Thus, a master may have}

to wait an arbitrary number of clock cycles until the bus is idle. In order to obtain several slave operations per bus grant an OPB master may utilize a bus lock, also referred to as burst or sequential access. Sequential access keeps arbitration overhead to a minimum.[18, 19]

Physically the address- and data path, and control signals of the bus are implemented as a distributed multiplexer where the different slave data busses and other outputs are ORed together9_{. This approach make it possible to add peripherals to a system without changing the}

existing peripherals.[18]

The OPB signals are grouped into five categories: arbitration signals, bus signals, data trans-fer control signals, byte enable support signals, and DMA peripheral support signals. The two latter categories are optional, and are not discussed in this thesis. This section will describe the relevant signals for an OPB slave.[18]

• Arbitration signals: If a slave does not respond to an operation, or does not assert the Sln toutSup10 _{(see data transfer control signals below) within 16 clock cycles after the}

OPB select signal is asserted, the arbiter will assert the OPB timeout signal. It indicates to the master of the operation that the slave does not respond, and the operation has to be terminated.[18]

If a slave is unable to complete an operation the slave should assert the Sln retry signal. The assertion will cause the requesting master to cancel the operation. Sln retry must remain asserted until the slave is deselected.[18]

8_{The Xilinx OPB implementation that comes together with the EDK includes an arbiter.} 9_{Thus, every slave must drive all outputs to zero when inactive.}

10_{The Sln prefix indicates that the signal is directed from the slave to the bus. Mn is the prefix for a signal} going from a master to the bus.

(24)

• Bus signals: Each slave is given an address range during the system specification. To access a slave, or a specific register in a peripheral, the bus master use the address bus, OPB ABus. OPB ABus is 32 bits wide and carries its most significant bit in bit 0.[18] The data input and output signals to and from all OPB peripherals are separated. I.e. each peripheral contain one 32-bit in-data bus, OPB DBus, and one 32-bit out-data bus. The data output from all masters and slaves, Mn DBus and Sln DBus, are ORed together to form the data bus, OPB DBus. Both in- and out-data has its most significant bit in position 0.[18]

• Data transfer control signals: When a bus master is granted to use the bus by the arbiter it asserts the OPB select signal together with the correct address. The select signal will be driven high until the slave acknowledge the transaction, or asserts the OPB retry, or until the OPB timeout is asserted by the arbiter. If the master that is in control of the bus terminates the transaction by deasserting OPB select, all slaves must terminate the process and reset their state machines.[18]

The direction of the transaction is indicated by OPB RNW, Read Not Write. It must be valid any time the OPB select is asserted. If the signal is high it indicates that a read operation is taking place, and if it is low it indicates that the current operation is a write.[18]

The bus architecture provides the OPB seqAddr signal to indicate that the following bus cycle will have the same direction as the current operation to the next sequential address. Thus reducing the access latency. There will be no intervening bus transactions to other addresses if the OPB seqAddr is correctly asserted by a master. If the signal is ignored by the slave, the data transfer proceeds normally.[18]

To indicate that the slave is finished and, in case of a read operation, valid data is available on Sln DBus, the slave asserts the transfer acknowledge signal, Sln xferAck. The signal must not be asserted for more than one clock cycle per data transfer, nor in conjunction with Sln retry. If Sln xferAck is asserted in the same cycle as OPB timeout, the bus master should ignore the timeout signal and finish the transaction.[18]

If the slave encounter any kind of error it should assert the error acknowledge signal, Sln errAck, to terminate the operation. The signal must be asserted together with Sln xfer Ack.[18]

By default the OPB Arbiter asserts the OPB timeout after 16 clock cycles. To prevent a bus timeout, slow slaves may assert the timeout suppress signal, Sln toutSup, at any time before the 16th clock cycle. Sln toutSup must remain asserted until the operation is completed.[18]

2.4.4 The OPB cycle

The length of a OPB bus cycle depends on the slave. With a fast peripheral the cycle can be as short as two clock cycles, but usually the bus cycle is longer than that. This section describes a basic bus operation regardless of the speed of the slave. Each step corresponds to a reference point in figure 2.4.[18]

i. One clock cycle after a master is granted access to the bus by the arbiter it asserts OPB select. At the same time the valid address and, in case of a write operation, the valid data is written to OPB ABus and OPB DBus respectively.

(25)

Technology & Background 13 OPB_CLK OPB_ABus OPB_DBus OPB_select iii OPB_RNW Sln_xferAck Sln_errAck Sln_toutSup iv v i 1 2 … _{n - 1} _n ii vi

Figure 2.4: This figure describes a basic OPB transaction. The figure is a modified version of figure 6 in [20].

ii. At the same time as the select signal is asserted OPB RNW is set to the correct value, high if it is a read operation or low if it is a write operation.

iii. If the slave does not finish the transaction within 16 clock cycles from the assertion of OPB select, then it must assert the Sln toutSup signal. Otherwise will the slave not be able to finish the transaction properly since the arbiter will assert OPB timeout.

iv. In case of a read transaction, the slave will write the data to the data bus during the last clock cycle. At all other time Sln DBus is driven to zero.

v. When the slave is finished with the operation it asserts Sln xferAck for one clock cycle. If the slave has been unsuccessful with the transaction it should assert Sln errAck as well during the last clock cycle.

vi. When the master has registered the Sln xferAck it deasserts OPB select. At the same time zeros are written to OPB ABus and OPB DBus. When OPB select is deasserted the slave should deassert any outputs to the OPB, i.e. the outputs should be driven to zero.

(26)

2.5 Designing OPB peripherals for Microblaze

Due to the complexity of modern SoC devices, a standardization of the connection of different cores has been an important development issue for Xilinx. The choice of incorporating the OPB into EDK has made the interfacing of custom logic to a Microblaze system is a relatively straightforward task.[20]

There are, however, some considerations to keep in mind when designing a custom OPB peripheral. The core must be compatible with the OPB protocol described in section 2.4.4, as well as the OPB interface. The design must also meet the requirements of the platform generator in order to enable the automated system synthesis flow. Additionally there are some general design guidelines proposed by Xilinx in order to improve timing:[20]

• Signals going to and from the user core should be registered

• Try to avoid using different clock domains by utilizing Clock Enables

• Reset output signals from slaves synchronously using OPB xferAck

• If the data width is smaller than 32 bits, expand the data path to 32 bits and tie unused lines to zero, or apply appropriate steering logic

For compatibility with the platform generator two additional files are needed together with the code; the .PAO and .MPD files. The first file specifies the Peripheral Analyze Order and defines which HDL files that are needed for synthesis. The second file is the Microprocessor Peripheral Description file. It defines the interface of the core, i.e. properties of input and output ports, synthesis parameters, interrupts etc.[20]

2.5.1 IPIF

To facilitate a common bus interface for core designers Xilinx has developed the OPB IP interface, IPIF. It is a simplified bus wrapper that takes care of the OPB timing protocol, address decoding, and appropriate byte steering onto correct byte lanes when the core data width is smaller than the OPB data width. The interface also includes the following optional features: interrupt handling, read- and write FIFOs, DMA, and Scatter Gather.[21]

One major disadvantage of IPIF is that it targets CoreConnect bus architectures on Xilinx platforms only. If a design is to be re-used with a different bus architecture, or on an FPGA of different brand, the core has to be rewritten.[22]

2.6 Clock domain crossing

The On-chip Peripheral Bus is working synchronously with the system clock11_{, while the ISA bus}

is working with a clock frequency between 8 and 8.33 MHz. Design of a core that is working in two different, asynchronous, clock domains need some mechanism for safe transfer of data across the clock boundary.

One solution is proposed in [23]. When data is ready on the transmitting side a D-flip-flop is clocked and, thus, asserting a flag signal common to both the transmitter and the receiver. As long as the flag is high the transmitter must maintain the data on the bus. The receiver reads the data into a register in the receiver clock domain and clocks a flip-flop that resets the transmit flip-flop and pulls the flag low. When the flag is low the flip-flop on the receiver side is reset.

(27)

Technology & Background 15 Parallel data D Q CLK Q D CLK High High Clear Clear Flag Ready Acknowledge Transmitter Receiver

(28)

Chapter 3

Implementation & Testing

This chapter gives an account for the different issues related to the design, implementation and testing of the OPB to ISA bus bridge. The first section describes the different Ethernet options available and how the choice was made, the second section explain the bridge, and the third section deals with the testing issues related to the implementation. Everything in this chapter, text and figures, that do not have a reference is a contribution by the author.

3.1 Choosing Ethernet MAC/PHY

One of the IP cores that is provided with the EDK is the OPB Ethernet MAC core. In conjunction with an off-chip physical layer it provides a working network interface. One drawback with the core, however, is its size. When included in a Microblaze based embedded system it will use almost one third of a Virtex-II device with one million gates. Since the FPGA might be utilized for more applications, than merely a microprocessor connected to a local network or to the internet, it is desirable to find a solution where the system is cooperating with an external Ethernet MAC/PHY device. Another drawback is that the Xilinx Ethernet core is not free, after three hours of use it has to be restarted unless a licence fee is paid.

In order to connect the Ethernet MAC/PHY to the Virtex-II FPGA and the Microblaze system, the Ethernet device has to meet certain requirements:

• The I/O signals between the FPGA and the Ethernet chip must be run at 3.3V.

• For convenient testing and connection the chip need to be readily available on some sort of development board1_.

• The communication interface between the chip and the system need to be of a known type which is either available in the Microblaze system or possible to implement.

• The operating system should contain a device driver for the chip of choice, otherwise a device driver need to be implemented.

• It should be possible to change the Ethernet MAC/PHY to another device in the future. At this stage there is no requirement on the communication speed of the Ethernet device, thus either a 10 Mbit/s or faster chip can be utilized.

1_{A development board is a printed circuit board containing all overhead circuitry needed to use the chip.}

(29)

Implementation & Testing 17

There were several candidates when choosing an Ethernet MAC/PHY chip to use in conjunc-tion with a Microblaze based embedded system. The considered chips were:

• Cirrus Logic CS8900A

• Davicom DM9008

• Realtek RTL8019AS

• Standard Microsystems Corporation LAN91c96 and LAN91c111

All of the chips provided an ISA-bus interface, and the LAN91c111 had several other interfaces as well. Also, the LAN91c111 offered 10/100 Mbit/s communication speed2_{. No Ethernet}

MAC/-PHY devices from Intel were considered since they employ the PCI bus architecture which is not suitable to use with the target operating system, uCLinux, due to its complexity. The DM9008 and RTL8019AS offered only 5V I/O signals, hence unsuitable to use in conjunction with the Virtex-II FPGA which only supports 3.3V I/O signals. Of the remaining three only CS8900A was readily available on a development board. The option of designing a custom printed circuit board (PCB) for the LAN91c111 was considered, but the idea was rejected due to limited time and the level of PCB design experience. Thus, the CS8900A was chosen as the Ethernet MAC/-PHY for this implementation. However, since the ISA-bus interface is widely spread among the available Ethernet MAC/PHY chips, future implementations of a Microblaze system with external MAC/PHY are not bound to use CS8900A. The only requirement which was not met is thus the lack of interface between the Ethernet chip and the Microblaze system.

The operating system of choice for the system, uCLinux, contain a well tested driver for the CS8900A chip. Thus, no such development needed to be undertaken.

3.2 OPB to ISA bridge

Prior to this work, no known IP core connecting the OPB with the ISA bus existed. Consequently a core linking the two bus architectures needed to be developed.

The first attempt was to employ the IP interface (IPIF) provided by Xilinx. It was, however, unsuccessful due to immature nature of IPIF. Instead a more straightforward bus interface was created similar to the interfaces of the existing OPB peripherals where a simple bus wrapper is created to interface the custom logic with the OPB.3

The OPB to ISA bus bridge was implemented as a custom OPB slave peripheral. The initial step in the design flow was to identify the different parts of which the core should consist. These parts were designed as independent entities to enable a smaller scale development. Thus, the simulation and functional verification of every entity was carried out separately, see section 3.4. A modular design methodology also enables future development of the core to be carried out conveniently with redesign only of the part that need to be changed.

The design can, roughly, be divided into three different areas; the OPB interface, the ISA interface, and the glue logic around and inbetween the bus interfaces. The different parts are:

• Bus wrapper

• OPB interface

2_{In order to utilize the 100 Mbit/s speed the interface clock need to run at least at 25 MHz if a MII PHY} interface is employed.

3_{Only IPIF from EDK v3.2 was tried, recent releases of EDK include a revised version of IPIF which might} work correctly.

(30)

18 Implementation & Testing

• ISA interface

• Timeout watchdog

• ISA clock generator

• OPB to ISA communication (clock domain crossing)

Figure 3.1 shows the overall layout of the core and its signals. Included in the glue logic is the bus wrapper, the timeout watchdog and the clock domain crossing (CDC) mechanism. The pselect module is shown in the figure, but is included as a part of the bus wrapper. The clock boundary is defined by the dashed line.

OPB interface OPB_ABus(0:31) OPB_DBus(0:31) OPB_BE(0:4) OPB_clk OPB_Rst OPB_RNW OPB_select OPB_seqAddr I2O_DBus(0:31) I2O_xferAck I2O_errAck I2O_toutSup I2O_retry OPB_select_i ISA clock generator ISA interface ISA_SA(19:0) ISA_SD(15:0) ISA_BALE ISA_IOR ISA_IOW ISA_BCLK ISA_CHRDY ISA_IO16 ISA_RESDRV IRQ9 IRQ10 IRQ11 IRQ15 ISA_SHBE OPB to ISA communication Ready 2 Acknowledge 1 Timeout watchdog ISA clk Flag 1 Flag 2 ISA_Finish ISA_Failure OPB_select_i pselect Ready 1 Acknowledge 2 Bus wrapper IRQ9 IRQ10 IRQ11 IRQ15 Interrupt9 Interrupt10 Interrupt11 Data to OPB Interrupt15

ISA clock domain OPB clock domain

Figure 3.1: This block diagram illustrates the different parts of the bus bridge

3.2.1 Specification

Table 3.1 gives an account for the specification of the different parts of the core on different issues.

The OPB interface and the ISA clock generator are working synchronous to the system clock which is equal to the OPB clock, while the ISA interface and the timeout watchdog are working

(31)

Implementation & Testing 19

synchronous to the ISA clock signal. The communication module is asynchronous and triggers on the ready- and acknowledge signals from the bus interfaces. The signals to the CDC, however, are synchronous to the respective clock domain.

The bus bridge is activated when the OPB select signal is asserted together with an address on the OPB address bus which is in the defined address range of the core, the address range is set in the MHS file during system design. When the bridge is inactive all output signals to the OPB are driven to zero, see section 2.4.3. Since the bus bridge is an OPB slave it is never triggered by an ISA peripheral. Thus, the ISA interface is only started when an address or data is available from the OPB side. According to specification, section 2.4.1, the address and data busses on the ISA side are latched. That is, at a certain time in the ISA bus cycle address, and in case of a write operation, data are written to the respective bus. The content on the busses is not changed until the next ISA bus cycle. The ISA control signals are registered, however, and reset at the end of each bus cycle. In the glue logic the address- and data paths are registered and driven to zero by the OPB interface when the bridge is inactive. The control signals, such as ISA Finish and ISA Failure, are registered and synchronous to the clock signals as well, while the flag signals from the CDC are registered but asynchronous to the clock signals.

Table 3.1: Bus bridge specification

issue OPB Glue ISA

clock System clock System and ISA clock ISA clock

handshake Triggered by CDC started by Started with the

OPB select signal ready signals, rising edge of the and correct address watchdog triggered flag indicating

by internal select available data data bus consistency Zero when inactive Zero when inactive Latched data address bus cons. Zero when inactive Zero when inactive Latched address control signal cons. Zero when inactive Registered Registered

3.2.2 Components

Every module that performs a series of tasks with inputs from other parts of the core, or other parts of the system was designed as a finite state machine. That is, the two bus interfaces and the timeout watchdog. The finite state machine approach provided a straightforward method to make the bus interfaces speak the particular bus specification since a FSM is clock dependent and the bus architectures are synchronous. During synthesis the state machines are highly optimized by the synthesis tool, which minimize the device utilization. The ISA clock generator and CDC modules were not implemented as finite state machines but with a register approach.

In all three cases the finite state machines were of Moore type4_{. That is, the outputs from}

the state machines are dependent of the current state and independent of the input signals. It was necessary to use the Moore approach since the output signals from the finite state machines had to be synchronous to the clocks in order to strictly follow the bus specifications.

Bus wrapper

The OPB to ISA bridge bus wrapper provides the interface between the core and surrounding system, and ties the different parts together to one visible entity. In figure 3.1 the wrapper 4_{The alternative approach is the Mealy state machine. In such machine the outputs are affected by the inputs} and can be altered asynchronously.

(32)

is defined by the line surrounding the different modules. Apart from the modules designed specifically for this core, and discussed below, the wrapper also consists the pselect module. It is a highly optimized address decoder provided by Xilinx. It takes the OPB select signal and the address bus as inputs and generates the internal OPB select when the address on the bus is within the address range of the core that is specified in the MHS file.

A number5_{of custom parameters are used by the wrapper to enable flexibility during system}

design, see lines 36 to 40 in listing E.1.

• C ISA CLK DIV is an integer forwarded to the ISA clock generator which specifies the factor by which the system clock should be divided by to obtain the ISA clock signal. By default this constant is set to 8 which is a suitable division factor for the system used during the design and implementation of this core.

• C USE CHRDY is a bit used by the ISA interface to decide whether it should wait for the CHRDY signal to be asserted or not before the bus cycle is finished. By default this constant is set to 1.

• C NUM ISA CYC TOUT is an integer forwarded to the timeout watchdog telling it how many ISA clock cycles to wait before ISA Failure should be asserted. By default this constant is set to 16 clock cycles.

• C ON CHIP SL ONLY is a single bit telling the bus wrapper whether the ISA peripherals will be on-chip only, or both on- and off-chip. It is necessary for this parameter to be 1 if only on-chip peripherals are connected to the ISA bus. If no peripherals are connected to signals going off-chip, the signals will end up in an undefined, or high, state. By default this constant is set to 1 and should thus be changed when the off-chip Ethernet MAC is connected to the ISA bus interface.

• C NUM ON CHIP SL is an integer telling the bus wrapper how many on-chip peripherals are connected to the ISA bus. It is used to generate the correct number of CHRDY signal inputs, see lines 237 to 274 in listing E.1. By default this constant is set to 1.

The signals to and from the ISA bus are connected to IOBs in the FPGA, see section 2.2.1. Insignals to the core, however, cannot be shared by on-chip peripherals and IOBs. Consequently on-chip counterparts of global insignals must be created when using on-chip ISA peripherals. This feature might be removed in the future due to the questionable use of such peripherals. On-chip peripherals are used for debugging and verification purposes, see sections 3.3 and 3.4. OPB Interface

The OPB interface is connected to the OPB side of the bus wrapper. Its main responsibility is to communicate with the on-chip peripheral bus according to the bus specification, see section 2.4.4. The OPB interface is also responsible for initiating the ISA bus cycle. The state machine consists of six states; Idle, Send, Waiting, Receive, Finish, and Failure, as illustrated in figure 3.2. The code for the OPB interface is presented in listing E.2.

When the bus bridge is addressed by the processor, the OPB interface is woken from its Idle state by the internal select signal on the next system clock cycle. The internal select signal, OPB select i, is generated by the pselect module in the bus wrapper. During the Send state the ISA bus cycle is started by asserting the OPB interface ready signal and keeping the address, and data during a write operation, available and unaltered until the flag goes low. The flag, which

(33)

Implementation & Testing 21 Idle Send Waiting Receive Failure Finish select = 1 xfer_success = 1 RNW = 1 ISA_Failure = 1 Ackn = 1 select = 0 ISA_Failure = 1 ISA_Finish = 1 reset

Figure 3.2: The OPB Finite state machine

is controlled by the clock domain crossing mechanism, is deasserted when the ISA interface is declaring its acknowledge signal, see section 2.6. The falling edge of the flag changes the internal signal xfer success from zero to one, the transition of xfer success back to zero is done when the FSM reaches its Waiting or Idle states, or upon a reset of the bus bridge, see lines 186 to 194 in listing E.2. When xfer success is asserted the next FSM state is set to Waiting6_.

The OPB RNW signal indicates if the current transaction is a read or a write. In case of a read transaction no waiting is carried out in the Waiting state, during the next clock cycle the FSM is set to Receive and the waiting for data is done in that state instead. When data is read from the ISA bus and the ISA to OPB flag is high, data is written to the I2O DBus. The state machine steps into its Finish state when the OPB interface has asserted its Acknowledge signal. If the ISA peripheral fails to assert CHRDY the bus cycle will be cancelled by the timeout watchdog and the next state in the FSM is set to Failure. During a write transaction, however, the OPB interface is waiting for the ISA interface to acknowledge the write in the Waiting state. When the ISA interface has successfully written the data to the ISA peripheral the ISA Finish signal is asserted, see the section about the ISA interface. ISA Finish causes the OPB FSM to step into its Finish state from which the state machine goes back to Idle at the next clock cycle. If the ISA interface does not assert ISA Finish, due to the ISA peripheral not asserting CHRDY, the write operation will be cancelled by the timeout watchdog.

When idle the OPB interface is required to drive all of its outputs to the OPB bus to zero 6_{The reason for calling the state Waiting instead of Wait is that the word Wait is a reserved keyword in VHDL.}

(34)

due to the signals from different peripherals are ORed together to form the bus lines. Since an I/O ISA bus cycle is bound to employ at least three ISA clock cycles, equivalent to 24 system clock cycles, the timeout suppress signal, I2O toutSup, should be asserted as soon as a bridge transaction is started. I2O toutSup must remain asserted until the transaction is successfully finished or cancelled. In this implementation the toutSup signal is asserted when the finite state machine reaches its Send state, and stays asserted until the bridge transaction is finished or cancelled. That is, the I2O toutSup signal is only deasserted when the OPB FSM is idle. When a transaction is finished I2O xferAck is asserted during the last OPB clock cycle. In case of a failure I2O errAck is asserted in conjunction with xferAck to indicate to the host system that the transaction failed.

The On-chip Peripheral Bus employs big endian bit ordering, i.e. the most significant bit is located in position 0 of the data field. The ISA bus, on the contrary, employs little endian bit ordering where the least significant bit is located in position 0. Data going to the ISA bus is located in bits 0 to 15 on the OPB DBus, and data coming from the ISA peripheral should be written to bits 16 to 31 on the I2O DBus. Also, only the lower 8 bits of the OPB ABus is of any importance for the ISA bus. The upper 24 bits are used to address the bus bridge itself. The bit reordering, and bit steering, is carried out by the OPB interface, see lines 238 to 269 in listing E.2.

If a bridge transaction is cancelled by the system, or if the core for any reason is reset, the OPB state machine is set to Idle. Thus, all internal and external signals are reset to a known state. Furthermore, if the ISA interface does not respond properly the state machine will always reach the Failure state due to the timeout signal from the watchdog. Thus, the state machine is free from deadlocks.

The design of the OPB interface was a fairly straightforward task because of the many examples available with the EDK. To improve the timing the guidelines in section 2.5 were followed as close as possible. That is, the inputs from, and outputs to, the OPB were registered, the core is reset when the bridge transfer is acknowledged (OPB xferAck is asserted) and the 16 bit data path from the ISA interface is expanded with zeros to 32 bit width. Observe that some signals are not explicitly registered, the outputs from the FSM for instance. During synthesis, however, all signals are registered.

ISA Interface

Connected to the ISA side of the bus wrapper, the key responsibility of the ISA interface is to communicate with on- and off-chip ISA peripherals. That is, the ISA interface should act as a bus master on the ISA bus and carry out bus cycles according to the specification, see section 2.4.1. The ISA interface state machine consists of four state; Idle, Address, Data1, and Data2 in conjunction with the ISA bus cycle specification, see figure 3.3. Note that the ISA bus cycle is only started by the Microblaze system, through the OPB interface, and never by a ISA peripheral. A peripheral can, however, notify the system that data is available by giving an interrupt if interrupts are supported by the device driver. The code for the ISA interface is presented in listing E.3.

The state machine is started when the flag from the clock domain crossing module indicates that valid address and data is available from the OPB interface. Once the state machine is started it spends one clock cycle at each of the two first states Address and Data1. At the last state, data2, the state machine is dependent on the CHRDY signal to be asserted in order to finish the bus cycle.

Originally the ISA interface module was designed to always wait for the CHRDY signal to be asserted in order to continue. It is not mandatory, however, that the ISA peripherals utilize the

(35)

Implementation & Testing 23 Idle Address Data1 flag = 1 Data2 CHRDY = 1 Address ready Data1 ready reset

Figure 3.3: The ISA Finite state machine

signal. The CS8900A development board used for this project, for instance, does not indicate that it has done a successful read or write by asserting CHRDY. The ISA interface simply assumes that the peripheral is working properly, in a commercial system, however, the CHRDY signal should be utilized to ensure that data is read and written correctly. To overcome the lack of CHRDY signal from the ISA peripheral a parameter was introduced in the core that indicate if the ISA interface should check the signal or not. The parameter is set in the MHS file during system design. If the ISA peripheral is not using CHRDY, the internal counterpart is set high during synthesis.

According to specification the BALE signal should do a low to high transition at the falling clock edge in the address cycle, and fall again on the next rising edge of the ISA clock. To obtain this feature the bale signal needed to be ANDed together by two signals; one that is set to high at the falling edge of the ISA clock in the address cycle and low at all other falling edges of the clock, and one that is high during the address cycle. See lines 123 to 160 and 289 to 290 in listing E.3. The falling edge of the BALE signal marks the end of the address cycle and the transition from the Address state to the Data1 state. On the falling ISA clock in the middle of Data1 the appropriate read (IOR) or write(IOW) signal on the ISA bus should be declared. The signals are active low and should go high again at the end of the last data cycle. IOR and IOW are achieved in a similar fashion as the BALE signal, see lines 166 to 178 and 294 to 295 in listing E.3.

(36)

the high bytes of data bus is about to take place. The CS8900A chip requires that a high to low followed by a low to high transition is provided by the signal after any hardware or software reset. As for the IO16 signal, it is not utilized by the CS8900A development board, and it is omitted in this implementation. At this point it is assumed that only 16-bit ISA peripherals will be connected to the bus bridge.

The ISA data bus is bi-directional, i.e. the same bus lines are used for both reads and writes. To achieve bi-directionality the data bus is implemented as tri-state buffers as recommended in [20].

The internal reset signal in the ISA interface is created by ORing the OPB Rst, the Failure signal from the watchdog, and the inverse of the OPB select i. That is, the ISA interface is reset and the state machine is set to Idle at any global reset, timeout failure, or if the core for any reason is deselected by the system. Hence, the state machine will never be trapped or deadlocked. At a reset the RESDRV signal is asserted resetting all ISA peripherals as well. Consequently the ISA interface can never reach a deadlock situation where it is keeping the rest of the system waiting.

Timeout watchdog

One of the most critical parts of the OPB to ISA bus bridge is the timeout watchdog. It ensures that the core never consumes more than a specified amount of clock cycles. Needless to say, it is vital that the timeout watchdog never fails since its most important task is to reset the bus interfaces during an error in the ISA interface or in a peripheral.

Idle Counting Data1 flag = 1 Failure Timeout = 1 ISA_Finish = 1 OPB_select = 0

Figure 3.4: The Timeout Watchdog Finite state machine

The watchdog is constructed with one counter and one finite state machine. The three states of the machine are; Idle, Counting, and Failure, see figure 3.4. As shown in figure 3.1 the watchdog has four inputs; ISA clock, OPB to ISA Flag, OPB select, and ISA Finish, and one output; ISA Failure. It is working synchronous to the ISA clock and the state machine is started, by changing state from Idle to Counting, when the flag is high. When the counter reaches its

Interfacing an external Ethernet MAC/PHY to a MicroBlaze system on a Virtex-II FPGA

Interfacing an external Ethernet

MAC/PHY to a MicroBlaze system on

a Virtex-II FPGA

Interfacing an external Ethernet

MAC/PHY to a MicroBlaze system on

a Virtex-II FPGA

Abstract

Contents

List of Figures

List of Tables

List of Listings

Chapter 1

Introduction

1.1

Background

1.2

Objectives

1.3

Method

1.4

Limitations

1.4.1

Time

1.5

Thesis outline

Chapter 2

Technology & Background

2.1

Systems-on-Chip, SoC

2.2

Field Programmable Gate Array, FPGA

2.2.1

Virtex-II

2.2.2

The Microblaze core

2.2.3

Embedded Development Toolkit, EDK

2.3

Ethernet MAC/PHY

2.3.1

Media Access Controller, MAC

2.3.2

Physical Layer Device, PHY

2.3.3

The OPB Ethernet MAC

2.4

Bus architectures

2.4.1

The ISA bus specification

2.4.2

Standard 16-bit I/O device ISA bus cycles

i

iii

ii

iv

vii

v

vi

2.4.3

The OPB bus specification

2.4.4

The OPB cycle

2.5

Designing OPB peripherals for Microblaze

2.5.1

IPIF

2.6

Clock domain crossing

Chapter 3

Implementation & Testing

3.1

Choosing Ethernet MAC/PHY

3.2

OPB to ISA bridge

3.2.1

Specification

3.2.2

Components