Implementation of an FPGA based Emulator for High Speed Power Electronic Systems

(1)

DEGREE PROJECT, IN MASTER'S PROGRAM IN EMBEDDED SYSTEMS , SECOND LEVEL

STOCKHOLM, SWEDEN 2014

Implementation of an FPGA based

Emulator for High Speed Power

Electronic Systems

MUHAMMAD WASIF ADNAN

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

TRITA ICT-EX-2014:196

(3)

Implementation of an FPGA based Emulator for

High Speed Power Electronic Systems

Masters Thesis at School of Information and Communication Technology (ICT), Kungliga Tekniska Högskolan (KTH),

Stockholm, Sweden

MUHAMMAD WASIF ADNAN

Supervisors: Dr. Sebastien Mariéthoz (ETH Zürich);

Oliver Schultes (ETH Zürich); Jamshaid Sarwar Malik (KTH)

Examiner: Dr. Ahmed Hemani (KTH)

(4)

(5)

iii

Acknowledgements

I would like to thank Dr. Sebastien Mariéthoz and Oliver Schultes for pro-viding me the opportunity to carry out this project at the Automatic Control Lab (IfA) at ETH, Zurich. Oliver was a great help to me during the implemen-tation phase of the project and provided very useful insights regarding sound programming practices in VHDL. The project also gave me the opportunity to interact with some very bright minds carrying out research at IfA. I would also like to thank Dr. Ahmed Hemani for agreeing to be my thesis examiner. I am grateful to Jamshaid Malik for providing very valuable feedback on the thesis, in addition to some much needed encouragement. I owe my interest and aptitude in the field of embedded systems and control of power electron-ics to two of my former bosses and mentors; Dr. Shoab Khan of CARE Pvt. Ltd. and Björn Jernström of Ferroamp Elektronik, for which I am immensely grateful. Finally, I am forever indebted to my family, who have supported and encouraged me in every possible way and for always being there for me.

(6)

iv

Abstract

During development of control systems for power electronic systems, it is desirable to test the controller in real-time, by interfacing it with an emulator device. In this context, this work comprises the development of an emulator that can model accurately the dynamics of high speed power electronic systems and provides interfaces that are compatible with the real hardware. The real-time state calculations, based on discrete models, were performed on custom logic, implemented on an FPGA. The realized system allows to emulate Lin-ear Parameter Varying (LPV) systems, achieving sampling rates up to 12MHz using a low cost Xilinx FPGA. As a result, power electronic systems with very high switching frequencies can be modeled. In addition, the FPGA incor-porates a soft-core processor that allows a designer to easily re-configure the system model through software. The emulator system has been validated for a multiphase DC-DC converter, by comparing its results with the real hardware setup.

Keywords. Hardware-in-Loop Simulations, Control of Power Electronic Sys-tems, FPGA, Hardware-Software Co-design

(7)

Contents v Glossary ix List of Figures x List of Tables xi 1 Introduction 1 1.1 Motivation . . . 1 1.1.1 System Overview . . . 2 1.2 Emulator Requirements . . . 3 1.3 Design Specifications . . . 4 1.3.1 Existing Architectures . . . 4 1.3.2 Design Parameters . . . 5 1.4 Thesis Organization . . . 5

2 Generic Model for Power Electronic Systems 7 2.1 Design Considerations . . . 7

2.1.1 State Space Representation . . . 7

2.1.2 Linearity . . . 8

2.1.3 Discrete Time . . . 8

2.2 Example Systems . . . 9

2.2.1 Multi-Phase DC-DC Converter . . . 9

2.2.2 Three phase Induction Motor Drive . . . 9

2.3 Linear Parameter Varying Systems . . . 11

2.3.1 Discretization . . . 12 3 Firmware Architecture 15 3.1 Functional Overview . . . 15 3.2 Digital HW Platform . . . 16 3.3 Soft-Core Processor . . . 17 3.3.1 IP Selection . . . 17 3.3.2 SoC Configuration . . . 19 v

(8)

vi CONTENTS

3.3.3 WB Slave Interface with Custom Logic Blocks . . . 20

3.4 Analog-to-Digital Converter Emulation . . . 20

3.4.1 Specifications . . . 21

3.4.2 SPI Master . . . 22

3.4.3 SPI Slave . . . 23

4 State Calculation Logic 25 4.1 Matrix-Vector Multiplication . . . 25

4.2 Decomposition of the System Matrices . . . 26

4.3 Module Design . . . 27

4.3.1 Processing Element . . . 29

4.3.2 PE Integration . . . 29

4.3.3 FSM Design . . . 32

4.4 Fixed Point Arithmetic . . . 33

5 Hardware Validation 35 5.1 Testing Environment . . . 35

5.1.1 Multi-phase DC-DC Converter : Model . . . 35

5.1.2 Hardware Setup . . . 37

5.1.3 PWM Signal . . . 37

5.2 Waveform Comparison . . . 38

6 Implementation Results 41 6.1 Maximum Clock Frequency . . . 41

6.2 Scalability . . . 42

6.3 Example System: Sampling Rate and Resource Usage . . . 43

6.4 Numerical Error . . . 44

7 Conclusion 47 7.0.1 Future Work . . . 48

Bibliography 49 A Appendix: LatticeMico32 Processor Configuration 51 A.1 Introduction . . . 51

A.2 Integrating a LM32 based SoC with Xilinx tools . . . 51

A.2.1 Steps for Integration with Xilinx Tools . . . 52

A.2.2 Using Block RAM for Code/Data memory . . . 53

A.2.3 Using Xilinx data2mem Utility to update bitfile with Software Executable . . . 54

B Appendix: Data Sheet References 55 B.1 Slice Architecture: Xilinx DSP48A1 Slice for Spartan6 FPGA Family 55 B.2 SPI Interface Timing Diagram . . . 56

(9)

CONTENTS vii

C Appendix: HDL Code Snippets 57

C.1 Dual Port Block Memory with Generic Bus-Width . . . 57 C.2 Generic Depth Moving Average Filter for Binary Inputs . . . 58

(10)

(11)

Glossary

ADC Analog to Digital Converter

ASIC Application Specific Integrated Circuit DSP Digital Signal Processor

FPGA Field Programmable Gate Array

FACTS Flexible Alternating Current Transmission System FW Firmware

HDL Hardware Description Language HIL Hardware-in-Loop

HVDC High Voltage Direct Current HW Hardware

IC Integrated Circuits

IGBT Insulated Gate Bipolar Transistor IP Intellectual Property

LPV Linear Parameter Varying LTI Linear Time Invariant MACC Multiply-Accumulate

MOSFET Metal Oxide Semiconductor Field Effect Transistor ODE Ordinary Differential Equation

PWM Pulse Width Modulation SW Software

ZOH Zero Order Hold

(12)

List of Figures

1.1 Generic Switched Power Electronic System (top), replaced by an

emula-tor (bottom) . . . 3

2.1 Circuit diagram of a Multi-phase DC-DC Converter with 3 phase legs . 10 2.2 Ideal Transformer (IRTF) based diagram of Induction Motor . . . 11

3.1 HIL Simulation Setup implemented on two FPGAs . . . 15

3.2 Avnet Lx9 Microboard, wih a Xilinx Spartan6-Lx9 FPGA . . . 16

3.3 Interface between CPU and custom logic blocks . . . 21

3.4 Architecture and Interfacing for the SPI Master/Slave blocks . . . 22

3.5 Finite State Machine for SPI Master block . . . 23

3.6 Finite State Machine for SPI Slave block . . . 23

4.1 Functional Diagram of the State Calculation Logic . . . 28

4.2 State Calculation: Single Processing Element . . . 29

4.3 Architecture of the State Calculation Module . . . 30

4.4 Finite State Machine for State Calculation Module . . . 32

5.1 Hardware Setup for Emulator Validation . . . 37

5.2 Inductor Current . . . 39

5.3 Output Voltage: Hardware (top) vs. Emulator (bottom) . . . 39

5.4 Step change in Duty Cycle: Hardware (Left) vs. Emulator (Right) . . . 40

6.1 Numerical Error in the Output Waveform . . . 44

6.2 Mitigation of Numerical Error by Pre-Filtering . . . 45

B.1 Source: Spartan 6 FPGA DSP48A1 User Guide UG389 (2014) . . . 55 B.2 Source: ADC122S051 Data sheet (SNAS257E), Texas Instruments (2004) 56

(13)

List of Tables xi

List of Tables

3.1 Comparison of Soft-core Processor IPs . . . 18

4.1 Example Layout of the Coefficient Block RAM . . . 31

5.1 Multi-phase DC-DC Converter - Load Circuit Parameters . . . 38

6.1 Performance and Resource Utlization Metrics based on Model Parameters 43 6.2 Total Resource Utlization for Emulator for a 3-phase DC-DC Converter 43

(14)

(15)

Chapter 1

Introduction

1.1 Motivation

Switched power electronic circuits are an integral part of a wide variety of power systems. Their applications range from high power Flexible AC Transmission Sys-tems (FACTS) and High Voltage DC (HVDC) sysSys-tems; to medium power electrical drives, and solar inverters; to low-power switched power supplies. Their use allows efficient and flexible power transfer between the source and the load.

Design and implementation of control systems for switched power electronic systems is a popular research area. Keeping in view the increasing ubiquity of such circuits in all types of power systems, advanced control techniques are employed [3], primarily to achieve the goal of high efficiency (low switching losses in the circuit) and low harmonics and distortion in the output.

Power electronic systems are often employed in power distribution grids or in-dustrial applications, with tough requirements on safety and reliability. Given the increasingly complex converter topologies, and control techniques, it is crucial to verify and validate the controllers, in real-time, before deploying them. The sim-plest approach would be to test the controller with actual hardware. However there are several challenges, associated with testing on real hardware [4]:

• Testing cannot commence until a hardware prototype has been manufactured. • Power electronic systems often include integrated components, which makes

any modifications time-consuming and expensive.

• An error in the controller design/implementation can result in physical dam-age to hardware.

• Observation of the dynamic states of the system requires interfacing with sen-sors. This may be impossible in some cases. Even when possible, measurement noise needs to be taken into account.

• Testing at or beyond the safe operating limits of the real hardware is not feasible

(16)

2 CHAPTER 1. INTRODUCTION

• Tests on real hardware are not repeatable, as they are dependent on a lot of physical factors, which may vary among the conducted tests.

In this context, a desirable method is to test the controller against a real-time emulator of the actual hardware. This process is also known as hardware-in-loop simulation (HIL). The emulator, implemented on a digital platform, models the dynamics of the power electronic system and provides interfaces that are compatible with real hardware. This allows for successful deployment of the controller with real hardware, as soon as it has been verified with the emulator.

HIL simulations are also useful during rapid-prototyping of power electronic systems. Since the emulator is model-based, it can be modified easily, as compared to modifications in physical hardware, which are time-consuming and expensive.

At this point, it is worth noting that offline simulations in software - while useful - are not suitable for validation, as they, usually exclude effects of latency, discretization and quantization that are inherent in digital controllers. Including said effects is often prohibitively difficult and time-consuming in simulation environ-ments. Simulations of switched systems would also have to incorporate the inherent discontinuities, which require very high sampling rates to be modeled accurately.

On the other hand, HIL simulations, with interfaces compatible with real hard-ware, allow verification and validation of the controllers during all stages of de-velopment, right up-to production. Since model calculations are in real-time, very accurate results can be obtained over a long time.

This thesis presents the development of an emulator for a generic switched power electronic system, that can be used to verify digital controllers for such system through HIL simulations.

1.1.1 System Overview

The fundamental components of a switched power electronic system include a DC voltage source, a converter circuit with controllable switches (typically IGBTs or MOSFETs) and a load circuit. The load can either be electrical (e.g. power supplies) or electro-mechanical (e.g. electrical drive systems). There exist several topologies for the converter circuit, based on the application, e.g. Buck, Boost, Ćuk, Neutral Point Clamped (NPC) etc [2].

In almost all the cases, the power electronic system is controlled through Pulse Width Modulated (PWM) signals, that control the ON/OFF states of the con-trollable switches. Although analog controllers exist, usually the PWM signals are generated by a digital controller.

Figure 1.1 shows an overview of a basic switched power electronic system, inter-faced to a digital controller. There are two basic interfaces between the controller and the circuit. Based on the control algorithm, the controller can modify either the period (switch frequency), duty cycle and/or the phase shift of the PWM sig-nals that control the switches in the converter circuit. There can be more than one PWM channels, as in case of multi-phase systems e.g. 3-phase AC circuits,

(17)

1.2. EMULATOR REQUIREMENTS 3

Control

System Electronic Power Circuit Load DC A/D PWM Emulator Discretized System Model PWM Control System D IG IT A L I N TE R FA C E

Figure 1.1: Generic Switched Power Electronic System (top), replaced by an emu-lator (bottom)

cascaded multi-phase DC-DC converters etc. For closed-loop control, current and voltage feedback is provided to the controller, through an A/D converter. Since, the sampled data from multiple channels is required, it is common to use a serial protocol (e.g. SPI, I2_{C etc.).}

For HIL simulations, the real power electronic system is replaced by an emulator, implemented on digital platform. The emulator computes the dynamic states of the real system based on the inputs from the controller and a discretized model of the system to compute states. The I/O interfaces of the emulator mimic those of the real system, so that once the controller has been verified against the emulator, it can immediately be connected to the real hardware without any modifications.

1.2 Emulator Requirements

The states of the system are computed through a set of differential equations. The order of the differential equations is dependent upon the number of energy storage elements in the load circuit. Power electronic systems, especially multi-phase sys-tems, are typically high order syssys-tems, and hence, are more complicated to model. There exist other types of non-linearities, based on the type of the load circuit. For example, in case of electrical drive circuit, the state (armature current/magnetic

(18)

flux) varies in a rotating frame of reference. The emulator needs to be able to perform complex computations.

The dynamics of a power electronic system associated with switching are inher-ently non-linear, comprising of high frequency discontinuities. The entire system can be thought of as a hybrid system. The discontinuities - associated with switch-ing - can be regarded as transitions between discrete states. Each discrete state is then associated with a different continuous-time transfer function.

It is possible to approximate over the discontinuities; for example, the output voltage of a buck converter is a linear function of the voltage input and the duty cycle of the PWM input. However, this approximation excludes the transient response of the system, and the effect of high frequency variations in input. In order to compute the dynamics of the system for high frequency discontinuities, the emulator needs to sample at a very high frequency. It must also be able to sample the input PWM signal with a very low latency.

Since, the emulator is essentially a tool employed during the verification and validation phase of development, it needs to be flexible. It should be possible to re-configure the hardware, based on any changes in the plant model. This can also during rapid-prototyping of power electronic systems.

1.3 Design Specifications

1.3.1 Existing Architectures

Hardware-in-Loop simulators are traditionally implemented on software [6] [7], using real-time processor hardware, such as xPC targets. A major advantage of such approach is ease of development and faster deployment of the solution. For example, the simulator can be designed in Simulink, and then loaded directly onto a xPC target for real-time implementation.

On the other hand, software based solutions hit bottlenecks due to the speed limitations of the processors. More significantly, due to sequential implementation, the maximum sampling rate of the emulator is indirectly proportional to the order of the system model.

Initial designs of HIL simulators for power electronic systems used FPGAs sim-ply for low latency I/O interfaces [4]. The computation units were limited to soft-ware. However, modern FPGAs, equipped with DSP blocks capable of running at very high clock frequencies, also allow complex calculations to be implemented in firmware. Additionally, by using parallelism - inherent in FPGAs - it is possible to perform these calculations at a very high sampling rates.

FPGA based design also allows a higher degree of modularity, such that sec-tions of the system can be de-coupled from each other, and execute independently without affecting the other sections. For example, a high order system model can be decomposed into sub-models, with fast and slow states. Such a decomposition can improve numerical performance, as will be demostrated in lateer chapters.

(19)

1.4. THESIS ORGANIZATION 5

platform, thus resulting in an emulator capable of very high sampling rates. The disadvantage of using a singularly FPGA based solution is lack of configuration options. Often the design is custom built for a specific type of system model. As a tool, an emulator should provide flexibility, so that a whole range of systems can be tested using a single platform.

1.3.2 Design Parameters

Due to the high sampling rate, and low input latency requirements, an FPGA is the most suitable solution to perform the state calculations. An FPGA based solution also allows parallelism, which is useful for solving high order differential/difference equations.

On the other hand, a processor/DSP based solution provides more flexibility, re-reconfigurability and ability to perform complex computations. A hybrid solution, employing HW-SW co-design, with a soft-core processor, allows both requirements to be fulfilled.

The final design parameters for this thesis were chosen on the basis of above rationale and the requirements for an HIL simulator, as given in previous section.

1. Emulator and controller implemented, each, on an Avnet Lx9 Microboard, with a Xilinx Spartan 6-Lx9 FPGA

2. HW-SW co-design, with an open-source soft-core processor IP

3. Support for multi-channel PWM input signals, with switch frequencies up to 100 kHz

4. ADC emulator with support for multiple channels, and SPI slave interface

5. Real-time logging of state variables

6. Generic and modular architecture

7. Reconfigurability of firmware1 from software

1.4 Thesis Organization

This thesis will describe the design decisions involved in the development of an emulator for a generic, high frequency switched power system. It will also present results that validate the emulator against measurement data from real hardware. The project was carried out at the Control of Power Electronics research group at the Automatic Control Laboratory of ETH Zurich. Research at the group is focused on development and real-time implementation of control algorithms for drive Systems and power converters.

1_{This thesis will, henceforth, use firmware (FW) to refer to digital logic in FPGA (written in}

(20)

This chapter presents an introduction, with a background of the applications for such an emulator, as well as the design specifications that would meet the ob-jectives. Chapter 2 will discuss the modeling aspects of a generic switched power electronic system. It will also discuss the discretization of the model. Chapter 3 will present the design for the firmware architecture that forms the backbone of both the emulator and controller boards. It will also present the selected soft-core processor used for this project, as well its interfacing with the custom logic blocks in the FPGA.

Chapter 4 will describe the logic block that performs state calculations in firmware. This block is the most important contribution to this thesis. Chapter 5 will present the results of the state calculation logic, its capabilities in terms of maximum sam-pling rate and its resource utilization on different FPGAs. It will also present the effects of quantization, and the proposed method to mitigate them. Chapter 6 will present the procedure to validate the emulator against real hardware and present the results. Finally Chapter 7 will present the conclusion and recommendations for future work.

(21)

Chapter 2

Generic Model for Power Electronic

Systems

2.1 Design Considerations

The emulator essentially implements a mathematical model of the switched elec-tronic system, computed in discrete time. Since the emulator is not limited to model a specific type of power electronic system, it is imperative that the model of the emulated system have a generic form. In order to make the computations feasible on a FPGA-based platform, the model should also fulfill some fundamental requirements:

2.1.1 State Space Representation

Dynamic systems can be presented as a set of first-order differential equations that describe the relation between inputs and outputs of a system, via state variables, as given in Eq 2.1. Multiple states are defined in vector form, where the order of the vectors is equal to the order of the system. The state transfer functions

f (t, x, u) and output transfer function h(t, x, u), can be both linear/nonlinear; time

dependent/independent.

In the context of power electronic systems, the order (number of states) of the system typically depends on the number of energy storage elements in the circuit, for example, capacitors, inductors, electromagnetic coils/solenoids. In case of drives, mechanical energy is also stored as inertia.

The advantage of using state space representation is that it can be used to describe a system of any order. It can also describe Multiple-Input Multiple-Output (MIMO) systems. The vector representation of state variables, transfer functions particularly suits hardware implementation, as will be apparent in later chapters.

˙

x(t) = f (t, x(t), u(t)); x ∈ <n; u ∈ <p

y(t) = h(t, x(t), u(t)); y ∈ <q (2.1)

(22)

8 CHAPTER 2. GENERIC MODEL FOR POWER ELECTRONIC SYSTEMS

2.1.2 Linearity

FPGAs are most suited for simple mathematical operations, like addition and mul-tiplications. Modern FPGAs can make use of dedicated hardware resources to perform these calculations at very high clock frequencies. It makes sense to exploit these capabilities, and avoid more complex operations. Other mathematical opera-tions - like quadratics, exponents, trigonometric operaopera-tions etc. - require complex IP blocks that, often, require multiple clock cycles to execute and are not easily scalable. Therefore, it is best to restrict FPGA based model calculations to linear systems.

Linear state space systems can be represented through matrix multiplications, as given in Eq 2.2, where dim[A] = n × n; dim[B] = n × p; dim[C] = q × n; and dim[D] = q × p. Matrix multiplications are performed as a series of Multiply-Accumulate operations, which is ideally suited for FPGA implementations.

˙

x(t) = Ax(t) + Bu(t);

y(t) = Cx(t) + Du(t); (2.2)

2.1.3 Discrete Time

Digital systems are inherently discrete. Although there exist methods to solve differential equations analytically; they are specific to certain types of systems, and involve complex mathematical operations. Therefore, the model calculations need to be performed in discrete time. State space representation in discrete time includes a set of first order difference equations (Eq 2.3).

x_k+1= A_dx_k+ B_du_k;

yk= Cxk+ Duk;

where, xk= x(kTs)

(2.3)

The sampling period (Ts) and the discretization method depend on a number of

factors, most importantly the frequency of the systems’ eigenvectors. Essentially, a low sampling frequency for a system with high frequency dynamics, will result in loss of accuracy. This issue is very relevant for switched power electronic systems, many of which include both very high and very low frequency dynamics.

There also exist time-variable sampling rates (variable step solvers [8]) that are, in fact, best suited for systems with large variation in frequencies of eigenvalues, however, this thesis will be limited to fixed-step solvers.

For a given sampling rate, the discretization method of the differential equation can also affect the accuracy of the model, and can even affect stability of the modeled system. Discretization methods range from simple Euler and trapezoidal methods, to more complex, higher order Runge-Kutta methods. The simplest Forward Euler method for a linear first order ordinary differential equation (ODE) is given in Eq 2.4.

(23)

2.2. EXAMPLE SYSTEMS 9 ˙ x(t) ≈ xk+1− xk Ts xk+1 = adxk+ aduk= (1 + aTs)xk+ bTsxk yk = cxk+ duk (2.4)

2.2 Example Systems

It makes sense to consider what type of model is most suitable to simulate switched power electronic systems, by looking at the mathematical models for some examples of such systems.

2.2.1 Multi-Phase DC-DC Converter

The multi-phase DC-DC converter is a higher-order equivalent of a simple half-bridge buck/boost/buck-boost converter [14] [18]. The reasoning behind the use of multiple channels of energy storage elements (inductors) is to be able to provide higher power - using a smaller filter circuit - and to reduce switching harmonics. This is achieved by ensuring that the PWM signal to each phase leg is phase-shifted with respect to the previous phase leg by a given phase shift value.

Figure 2.1 presents the circuit diagram for a 3-phase DC-DC converter, with the IGBTs controlling the switching. The input to the system are PWM switch pulses (G1, G2, G3) provided at the gates of the transistors T1, T2, T3 respectively. Note

that the switch pulses to transistors T4, T5, T6 are the exact compliment of those

to transistors T₁, T2, T3 respectively. The state vector is composed of the current

through each phase-leg and the voltage across the capacitor. Typically, the control goal is to ensure that the voltage U0 at the output is maintained constant for a

varying load. Hence, R_L is a variable parameter.

The set of differential equations for the model is given in Eq 2.5.

˙i1 = − R1 L1 i1− 1 L1 u0+ UDCG1 ˙i2 = − R2 L2 i1− 1 L2 u0+ UDCG2 ˙i3 = − R3 L3 i1− 1 L3 u0+ UDCG3 ˙ u0 = 1 Ci1+ 1 Ci2+ 1 Ci3+ 1 CRL (2.5)

2.2.2 Three phase Induction Motor Drive

The induction motor (IM) is the most common type of motor in use in a whole range of applications. Although IM is traditionally operated by an AC source, most

(24)

10 CHAPTER 2. GENERIC MODEL FOR POWER ELECTRONIC SYSTEMS UDC L1 L2 L3 C RL UO i1 i2 i3 iL T1 T2 T3 T4 T5 T6

Figure 2.1: Circuit diagram of a Multi-phase DC-DC Converter with 3 phase legs

modern applications employ the use of switched drive circuits, connected to a DC voltage source. This allows a more flexible current/speed control scheme. On the other hand, closed loop control is absolutely necessary to be able to control the IM. For the modeling and analysis of the 3-phase induction motors, space vectors are used. Space vectors are time-varying (rotating) phasors that provide a mathemat-ical conversion from a 3D representation to a 2-D one [9, pp.84-86]. They can be visualized in the complex domain as ~x = xα+ jxβ. Figure 2.2 provides a diagram of the equivalent circuit of an Induction Motor, using the Ideal Transformer approach, and a 2-induction model [11]. L_M is the equivalent magnetizing inductance, and

Lσ is the equivalent leakage inductance. The dynamic variables (currents, magnetic

flux) are all space vectors.

Since the magnetic flux in the rotor (right side of diagram) is rotating in the physical space, a rotating coordinate system is needed to define the states on the rotor side. The states defined in a rotating coordinate reference are denoted as ~xxy

or ~xdq. Relation ship between states in stationary and rotating reference frames is given by ~xxy = ~xewst_.

The differential equations corresponding to the 2-inductance, IRTF based model of the IM are given in Eq 2.6 through Eq 2.9.

~ us= Rs~is+ d ~ψs dt (2.6) ~ ψs = Lσ~is+ ~ψM (2.7) ~ ψ_Mxy = L_M~ixy_s +~ixy_R (2.8)

(25)

2.3. LINEAR PARAMETER VARYING SYSTEMS 11

Figure 2.2: Ideal Transformer (IRTF) based diagram of Induction Motor

d ~ψ_Mxy dt = RR ~ixy R +~i xy R (2.9) The differential equations can be re-arranged as a set of first order ODEs given in Eq 2.10, where vector x corresponds to 2-D space vector ~x.

˙is(t) = − Rs+ RR Lσ is(t) + RR LσLM ψM(t) − 1 Lσ ψM(t)ωs(t)e−j+ 1 Lσ us(t) ˙ ψM(t) = RRis(t) − RR LM ψM(t) − ψM(t)ωs(t)e−j (2.10)

2.3 Linear Parameter Varying Systems

Based on the above examples, it is apparent that it is not sufficient to express switched power electronic systems using simple Linear Time Invariant (LTI) state space systems. Although mostly linear, there are some non-linearities that cannot be ignored. However, the fact that some dynamic states have much lower frequency eigenvectors than others can be used to exploit a specific type of linear systems i.e. Linear Parameter Varying (LPV) systems.

LPV systems are similar to LTI systems, except they include an extra term in the differential equation, with a time-varying parameter. Eq 2.11 presents the general form of a LPV system.

˙ x(t) = A0x(t) + m X i=1 Aix(t)zi+ Bu; z ∈ <m y(t) = Cx + Du; (2.11)

It must be kept in mind, though, that the time-varying parameter must vary slowly in time, as compared to the state variables. This assumption is quite valid for

(26)

12 CHAPTER 2. GENERIC MODEL FOR POWER ELECTRONIC SYSTEMS

the example systems presented above. For example, the load resistance R_L in the multi-phase DC-DC converter and the stator (ωs) and rotor (ωr) angular velocity

are reasonable choices for time-varying parameters.

Using this rhetoric, the LPV representation of the induction motor is given in Eq 2.12 to 2.13. Note that the state variables are complex (2-D) space vectors, defined in the stationary (stator) reference frame. This is due to the fact that the input to the system is the stator voltage (~us= uαs + juβs).

x =          iα_s iβ_s ψ_Mα ψ_Mβ          ; u =    uα_s uβ_s   ; z1 = ωs (2.12) A0=          −(Rs+RR) Lσ 0 RR LσLM 0 0 −(Rs+RR) Lσ 0 RR LσLM RR 0 −R_L_MR 0 0 RR 0 −R_L_MR          ; A1=          0 0 0 _L1 σ 0 0 −1_L σ 0 0 0 0 −1 0 0 1 0          ; B =          1 Lσ 0 0 _L1 σ 0 0 0 0          (2.13) 2.3.1 Discretization

For LTI systems, the corresponding analytical discrete equivalent of the state tran-sition matrices, based on the Zero Order Hold (ZOH) method, is given in Eq 2.14.

A_d = eATs_; Bd = B Ts Z 0 eATs_dt; (2.14) eATs _{= I + AT} s+ A2T_s2 2! + . . . (2.15) Since LPV systems are more complex than LTI systems, the ZOH method can not simply be applied to the LPV model, specifically the transformation of state transition matrices corresponding to time-varying parameters (Ai; i ≥ 1). The

(27)

2.3. LINEAR PARAMETER VARYING SYSTEMS 13

analytical discrete equivalent of the state transition matrices, using ZOH method, is not a trivial problem.

Instead, simplifications, such as the Forward Euler method, can be applied to approximate the matrix exponential through a Taylor series (Eq 2.15). Since For-ward Euler method is a first order method, the expansion is limited to the first 2 terms. Thus, it is preferrable to discretize the LPV systems using Forward Euler method. The approach is reasonable especially, since the implementation can be performed at very high sampling rates. For low values of T_s, the forward Euler method is as accurate as higher order discretization methods.

The difference equation corresponding to a generic LPV system, using Forward Euler discretization method is given in Eq 2.16. The emulator, presented in this work, implements computation for this model structure.

xk+1= xk+ A0dxk+ m X i=1 Ai_dxkzi+ Bduk where, Ai_d= AiTs; Bd= BTs (2.16)

(28)

(29)

Chapter 3

Firmware Architecture

3.1 Functional Overview

In order to validate the emulator, both emulator and controller were implemented on a digital platform. Figure 3.1 illustrates the built setup including the emulator and controller, implemented on separate FPGA boards.

The interfaces were designed, so that the controller can be connected to a real power electronic system - after verification with the emulator - without any changes.

Emulator FPGA Controller FPGA Soft-core Processor _PWM Modulator PWM Soft-core Processor ADC Emulator User Interface Discrete System Model - S1 Discrete System Model - Sm State Log Memory State Log Memory ADC Interface User Interface SPI Interface

Figure 3.1: HIL Simulation Setup implemented on two FPGAs

(30)

16 CHAPTER 3. FIRMWARE ARCHITECTURE

Figure 3.2: Avnet Lx9 Microboard, wih a Xilinx Spartan6-Lx9 FPGA

Both modules include a user interface, via UART, for observation of state variables, that are stored - in real-time - in the DDR memory.

HIL simulation often require the modification of both the emulator and con-troller, based on any changes in the plant model. Therefore, for ease of use, it makes sense to implement the controller and emulator on the same digital platform. The FW architecture is designed in a modular fashion, so that the only difference between emulator and controller implementation is the instantiation of different logic blocks, while the underlying architecture remains the same.

Although the primary purpose of this project was not to implement a complex control algorithm, the FPGA was configured, so as to provide a decent test and development platform for a controller, as well as the emulator. This allowed a control engineer to develop a controller for a power electronic system, while simultaneously making suitable modifications to the model parameters of the desired system, which is a realistic scenario for rapid prototyping of power electronic systems.

3.2 Digital HW Platform

The hardware platform used for this thesis is an Avnet Lx9 Microboard (Figure 3.2). It is a low cost development board with a Xilinx Spartan6-Lx9 FPGA. The relevant specifications for the board are listed below:

FPGA • Xilinx Spartan 6-Lx9 – 9152 Logic cells – 16 DSP48 blocks – 64KB Block RAM Off-Chip Memory

(31)

3.3. SOFT-CORE PROCESSOR 17

• 64 MBytes Low Power DDR (LPDDR) SDRAM • 128Mb Multi-channel SPI Flash

Interfaces

• USB-JTAG • USB-UART

• 2x12 Peripheral Module (PMOD) headers

The PMOD digital I/O is used as the primary interfaces between the controller and emulator boards.

3.3 Soft-Core Processor

In order to support reconfigurability through SW, the most essential component of the FW architecture is a soft-core processor. Re-configuring SW is less time-consuming, since the compilation time for SW is significantly less than synthesis time for HDL. Therefore, it is desirable to allow the logic blocks in the FW be configured using SW controller parameters.

In addition to configure-ability, a SW environment is more conducive for the implementation of complex, logic-intensive control sequences. This is, of course, done at the cost of speed.

Soft-core processors architectures are implemented using the resources on the FPGA. Since the processor and the FW are implemented on the same chip, the interface between them is much simpler and faster than off-chip interfaces. An-other major advantage with the use of soft-core processor IPs is that they can be configured to provide only the resources that are required by the application.

3.3.1 IP Selection

Soft-core processor architectures are available as Intellectual Properties (IPs) that can be instantiated in the FPGA design. For the purpose of this thesis, five soft-core processor IPs were considered. The selection of the soft-soft-core processor IP was based on the following criterion:

1. Support for real-time operation, and deterministic execution, in order to meet hard real-time requirements.

2. Support for interface with custom peripherals/HW accelerators, implemented on FPGA using HDL.

3. Ability to support all arithmetic functionalities, which are required for control algorithms.

4. Compatibility with Xilinx Spartan 6 FPGA; Specifically the Spartan6-LX9 FPGA.

(32)

IP Xilinx

Microblaze

OpenRisc

1200 LatticeMico32 aeMB OpenFire

Open Source No Yes Partially Yes Yes

Yes (Under Develop-ment)

Bus Architecture OPB/

PLB/ AXI Wishbone Wishbone

Wishbone/

FSL OPB

Optimized for Xilinx

FPGA ASIC Lattice FPGA

Xilinx

FPGA N/A

Documentation Extensive Limited Extensive Limited N/A

Reliability High Low High

Tested for Virtex family

Very Low

IDE Yes No Yes No No

Customization XPS GUI Interface

Source Code

GUI Interface and Source Code

Source Code

Table 3.1: Comparison of Soft-core Processor IPs

5. Ease of use, and preferably open-source.

Note that the emphasis is placed on retime operation, not complexity; al-though the two don’t have to be mutually exclusive. Since the processor is the not the main calculation unit, and needs to be implemented using limited resources, it is possible to choose a simple architecture over a more complex one. This also allows for fast execution rates.

Table 3.1 presents a comparison of the IPs that were reviewed. As far as opti-mization and usability are concerned, the Microblaze is the best choice. However, it requires the Xilinx EDK toolchain, which is only available under a paid license. Pre-built Microblaze core configurations are available for the LX9 Microboard. However, none of these configurations provide access to the AXI bus. Hence, it would not have been possible to connect the processor with custom logic blocks. A workaround for this problem is to use the GPIO ports, but that provides limited (or low speed, in case of serial interface) access to custom peripherals.

Among the open-source architectures, the OpenRisc 1200 provides the most options, in terms of peripherals and customizability. However, the documentation is very limited. It is a non-trivial task to modify the source code, based on any desired customizations to the processor core. Additionally, OpenRisc 1200 is designed for implementations on ASICs, and as a result, consumes more resources on an FPGA.

(33)

3.3. SOFT-CORE PROCESSOR 19

The LatticeMico32 (LM32) is analogous to the Xilinx Microblaze for Lattice FPGAs. However, it is available (with source code) under an open-source license. The processor architecture is 32-bit Harvard architecture, with support for 6-level pipelined execution. It was selected for this project, since it provides an extensive amount of documentation, while being simple enough for an optimized implemen-tation on an FPGA.

The most significant limitations of this processor architecture is a lack of a floating point unit (FPU). The alternative is to use a conventional ALU to perform calculations using fixed point arithmetic in SW. This adds to the design effort, as factors like saturation, overflow and quantization need to be taken into account. However, fixed-point arithmetic is the only optimal choice for RTL level calculations, keeping in view the limited area resources on the FPGA and the need to keep latency minimum. Therefore, it is acceptable to rely solely on fixed point arithmetic, even in SW.

3.3.2 SoC Configuration

Lattice MicoSystem is an eclipse based IDE, which is used to configure the System-on-Chip architecture with the LM32 processor. The IDE is also used to compile SW, along with the HW specific libraries, provided by Lattice. For this project, the following configuration was used:

• Bus Architecture

– Wishbone Bus

– Slave Side Arbitration with Fixed priority

• Memory

– Inline Memory

∗ Seperate Data and Instruction bus ∗ Block RAM

∗ Single Cycle latency

– Caches Disabled • Arithmetic Unit – HW Multiplier – HW Divider – Barrel Shifter • Peripherals – GPIO – UART

(34)

– WB Slave Pass-through for Custom Logic blocks

After customizing the SoC on a GUI based tool, provided by Lattice, it is pos-sible to further modify the source code (VHDL files) for the processor IP. Some modifications are required to, for example, make the IP compatible with Xilinx FP-GAs. This involved instantiating Xilinx specific primitives (for block RAM, buffers, arithmetic units etc.), in place of Lattice primitives. Other modifications were more optimization related. Appendix B contains a complete description of the steps re-quired to make LM32 compatible with Xilinx FPGAs.

Wishbone Bus

LM32 uses the Wishbone bus (WB), which is an open-source bus architecture, for communication with peripherals. Wishbone bus has support for single or burst read/write. It uses an acknowledge based handshaking protocol to ensure reliable communication with peripherals [10]. Since the processor uses inline memory for data and instruction, the bus interface only has a single master i.e. the data con-troller of the CPU. This eliminates the need for arbitration on the bus.

3.3.3 WB Slave Interface with Custom Logic Blocks

The LM32 IP is configured to allow external access to the Wishbone bus signals. In the FW, the bus signals are used to interface the Master (Data Controller in LM32) to the custom logic blocks. Figure 3.3 illustrates the interface signals.

This interface is designed to support a generic number of slave blocks. The blocks are memory-mapped to the address bus. The lower bits of the address bus are used to select the individual registers in a block, while the upper bits are used to generate chip-select (CS) signals, that enable the communication to/from a specific block.

Clock Boundary Synchronization

The CPU cannot be clocked at a very high clock frequency; for the Lx9 the maximum clock frequency attainable was 67MHz. In order to run the custom calculation logic at higher sampling rates, higher clock rates are required. Therefore, the CPU and the rest of the logic are on different clock domains.

The only interface across the clock domains includes the Wishbone signals. To ensure that data across the clock boundaries is not corrupted, a 3-flop synchroniza-tion method [1] was used.

3.4 Analog-to-Digital Converter Emulation

Almost all control applications of power electronic systems involve closed-loop con-trol. For this the controller needs access to some of the states of the Plant. In

(35)

3.4. ANALOG-TO-DIGITAL CONVERTER EMULATION 21

the context of power electronics, the measurable states of interest are typically the source and load voltages and the load current.

For digital control applications, the continuous-time states are sampled, via Analog-to-Digital (A/D) converters, into discrete-time, quantized values. In order to minimize the number of I/Os and accommodate multiple channels of sampled data, a serial communication protocol is as an interface between the A/D converter and the controller.

Since, verification of closed-loop control is a vital part of HIL simulations, the emulator provides an interface to system states similar to a common A/D interface.

3.4.1 Specifications

For the purpose of the thesis, an A/D emulator was implemented. Moreover, an A/D master module was also implemented as part of the controller FW.

The protocols for communication between the ADC and controller include, among others, Parallel interface, Serial Peripheral Interface (SPI), I2C, Delta-Sigma modulation etc. SPI is very flexible and requires a low number of I/Os while provid-ing high data rates (upto several GSPS). Therefore, the SPI interface is implemented in this work. The interface for Texas Instruments’ multi-channel A/D converters (Figure B.2) was used as a template for emulator.

Figure 3.4 illustrates the architecture and interface between the SPI Master (Controller board) and SPI Slave (Emulator board). The interface supports a

SYS_CLK DOMAIN (FAST)

CPU_CLK DOMAIN (SLOW)

LM32 CPU Data Memory Code Memory WISHBONE MASTER W IS H B O N E B U S

WISHBONE SLAVE INTERFACE

W E A D D R ES S C S D A TA _I N Slave Block SNB Slave Block S0 CS[0] D A TA _O U T Sy n ch ro n iz er B LO C K _R EG S B LO C K _R EG S C LK B O U N D A R Y CS[NB]

(36)

generic number of channels and a variable sampling rate, determined by the SPI clock.

The A/D blocks are instantiated as slave modules in the Wishbone Bus based architecture, with control registers for SW configuration of sampling rates etc., as well as an optional data interface.

For both master and slave blocks, the input signals are latched via a 2-flop synchronization method [1].

3.4.2 SPI Master

Figure 3.5 presents the state machine that controls the master block. The SPI clock is generated by a clock divider that also generates the corresponding clock enable (CE) signal that controls the Shift registers. This allows the use of a single clock domain in the design.

DSR_O is connected to the Master-Out-Slave-In (MOSI) signal, and is loaded with the address for the next channel. DSR_I latches in the Master-In-Slave-Out (MISO) that is de-serialized, to obtain the sample data, which is subsequently stored in the data register. The timing of the SPI frames is done through a control shift-register (CSR). The Chip-Select (CS) signal is used to synchronize the SPI frames. A/D EMULATOR SPI SLAVE CONTROLLER SPI MASTER SERIAL-IN PARALLEL-OUT DSR_I PARALLEL-IN SERIAL-OUT DSR_O Control FSM Data Regs CHN_ADDR SPI_CS_n 3 12 PARALLEL-IN SERIAL-OUT DSR_O SERIAL_IN PARALLEL_OUT DSR_I Control FSM Data Regs CHN_ADDR SPI_CS_n 3 12 S_CLK CS_n MOSI MISO

(37)

3.4. ANALOG-TO-DIGITAL CONVERTER EMULATION 23 SPI_SS_n=1 IDLE DSR_O_LD=1 SPI_SS_n=0 ADC_CHN++ CSR_IN=1 SAMPLE_INIT DSR_O_LD=1 SPI_SS_n=0 ADC_CHN++ OUT_REG_EN=1 CSR_IN=1 FRAME_START SAMPLING_TRIG SPI_SS_n=0 FRAME_WAIT SPI_SS_n=0 OUT_REG_EN=1 SAMPLE_END

[CSR_OUT & !CHN_LIMIT] [CSR_OUT & CHN_LIMIT]

Figure 3.5: Finite State Machine for SPI Master block

REG_SEL=0 IDLE CSR_IN=1 FRAME_START FRAME_WAIT [!SPI_SS_n] [!SPI_SS_n] [SPI_SS_n] [CSR_OUT] [SPI_SS_n]

Figure 3.6: Finite State Machine for SPI Slave block

3.4.3 SPI Slave

Figure 3.6 presents the state machine that controls the slave block. The input SPI clock is used to generate local clock enable (CE) signal that controls the Shift registers.

DSR_I latches in the Master-Out-Slave-In (MOSI) that is de-serialized, to ob-tain the channel address, which is subsequently used to load the DSR_O register with the corresponding state value from the data registers. DSR_O serializes the sample data and outputs it over the Master-In-Slave-Out (MISO) signal. The timing of the SPI frames is done through a control shift-register (CSR).

(38)

(39)

Chapter 4

State Calculation Logic

Chapter 2 presented a generic discrete state space format for a Linear Parameter Varying (LPV) model of a switched power electronics system. The primary con-tribution of this thesis is the design of an FPGA based logic that implements the discrete LPV model at very high sampling rates.

The design is an expansion of the general method for matrix multiplications described in [19]. This project applies the same concept to the specific case of matrix-vector multiplications to compute the discrete state space LPV model. The first stage in the design is to decompose the model calculations into basic arithmetic operations.

4.1 Matrix-Vector Multiplication

The state space model for a discrete LPV systems, from Chapter 2, can be re-written, as given in Eq 4.1. ∆x is the step variation in the state vector x over the sampling period. Note, that henceforth only discrete time systems will be considered; hence, the subscript in A_d is removed.

x_k+1 = x_k+ ∆x where, ∆x = A0xk+ m X i=1 Aixkzi+ Buk (4.1)

For an Nth order system (x ∈ <N), with M time-varying parameters z ∈ <M), computation of ∆x would involve M + 2 matrix multiplications. This is illustrated in equation 4.2, where M = 2.

(40)

26 CHAPTER 4. STATE CALCULATION LOGIC          ∆x1 ∆x₂ .. . ∆x_N          =          a0 1,1 a01,2 · · · a01,N a0_2,1 a0_2,2 · · · a0_2,N .. . ... . .. ... a0_N,1 a0_N,2 · · · a0 N,N                   x1,k x2,k .. . xN,k          + z1          a1 1,1 a11,2 · · · a11,N a1_2,1 a1_2,2 · · · a1_2,N .. . ... . .. ... a1_N,1 a1_N,2 · · · a1 N,N                   x1,k x2,k .. . xN,k          + z₂          a2_1,1 a2_1,2 · · · a2_1,N a2_2,1 a2_2,2 · · · a2_2,N .. . ... . .. ... a2 N,1 a2N,2 · · · a2N,N                   x1,k x2,k .. . xN,k          +          b1,1 a1,2 · · · a1,P b2,1 b2,2 · · · b2,P .. . ... . .. ... bN,1 bN,2 · · · bN,P                   u1,k u2,k .. . uP,k          (4.2)

Equation 4.3 decomposes the matrix multiplication into a series of summations or products. The total number of scalar multiplications, required to compute ∆x for N states, with M time-varying parameters and P inputs is N (P + N (M + 1)).

∆xj = M X i=0 " zi N X n=1 ai_j,nxn,k # + P X m=1 bj,pup,k; ∀j ∈ [1, N ] (4.3)

From a computational perspective, it is not feasible to assign dedicated multi-plier resources, which are very limited in FPGAs, for every multiplication in Eq 4.3. This is especially true when the number of states is high.

Instead, it makes sense to compute the sum of products iteratively, using an accumulator. Equation 4.4 expresses the computation of the step in state variable

xj through an accumulation operation. The coefficients cjl will be defined in the

next section. The updated value of the state variable is obtained after P +N (M +1) iterations, as shown in 4.5.

∆xl+1_j = ∆xl_j+ cj_lxl; ∀j ∈ [1, N ] (4.4)

xj,k+1= xj,k+ ∆xN (M +1)+Pj ; ∀j ∈ [1, N ]

∆x0_j = x_j,k

(4.5)

4.2 Decomposition of the System Matrices

In Eq 4.4, the state variables xl_j are multiplied with the coefficients cj_l, during the

lth _{iteration. These coefficients are derived from the state transition (A}

i) and the

input (B) matrices. The aim is to identify independent multiplications that can be performed, in parallel, during each iteration.

(41)

4.3. MODULE DESIGN 27

Consider the matrix-vector product in Eq 4.6, with N2 multiplications. Par-allelism requires that both operands are unique. Therefore, if the matrix-vector product was to be de-composed into N parallel multiplications, for each iteration, the corresponding coefficients are obtained from the shifted diagonals of the matrix A.

Equation 4.7 illustrates this, where every column corresponds to one iteration. The operand to each coefficient xl_j, in each iteration is unique and rotates for every iteration. Ax =          a1,1 a1,2 · · · a1,N a2,1 a2,2 · · · a2,N .. . ... . .. ... aN,1 aN,2 · · · aN,N                   x1 x2 .. . xN          (4.6)              C0 C1 .. . CN −1 CN              =              a1,1 a1,N · · · a1,2 a2,2 a2,1 · · · a2,3 .. . ... . .. ... aN −1,N −1 aN −1,N −2 · · · aN −1,N aN,N aN,N −1 · · · aN,1              (4.7)

The same concepts can be expanded to the M + 2 matrix-vector products in Eq 4.1, yielding the coefficient matrix in Eq 4.8. Notice, that additional matrix-vector multiplications simply added more columns (i.e. more iterations).

Hence, a reasonable degree of parallelism can be achieved by decomposing the matrix vector multiplications into sets of N independent products.

             C0 C1 .. . CN −1 CN              =              a0 1,1 a01,N · · · a11,1 a11,N · · · b1,1 · · · b1,M a0_2,2 a0_2,1 · · · a_2,21 a1_2,1 · · · b2,1 · · · b2,M .. . ... . .. ... ... ... ... ... a0_{N −1,N −1} a0_{N −1,N −2} · · · a1 N −1,N −1 a1N −1,N −2 · · · bN −1,1 · · · bN −1,M a0_N,N a0_{N,N −1} · · · a_N,N1 a1_{N,N −1} · · · bN,1 · · · bN,M              (4.8)

4.3 Module Design

Figure 4.1 presents a functional description of the module that implements the matrix-vector multiplications, as defined in the previous section. The architecture

(42)

28 CHAPTER 4. STATE CALCULATION LOGIC C0 PE0 PE1 PE2 PEN

z

i

x

0

ui

Matrix Coefficients (Block Memory)

x

1

x

2

x

N C1 C2 CN

MACC MACC MACC MACC

PRALLEL-IN SERIAL-OUT

Figure 4.1: Functional Diagram of the State Calculation Logic

is optimal for FPGA implementation, since it makes ample use of the MACC func-tionality, provided by the DSP48 slice resources, and word-shifting, which consumes no resource on the HW. It makes a trade-off by time-sharing resources, while max-imizing the resource utilization by exploiting parallelism.

Xilinx DSP48A Block

The recursive method of computing the sum of products, for matrix multiplications, is particularly suitable for FPGA based implementations. Xilinx FPGAs include a number of dedicated DSP48 blocks for high bandwidth arithmetic operations. Figure B.1 presents the block diagram of the architecture of the DSP48A1 Slice for a Spartan6 FPGA. Through op-codes, a number of customization options are available, one of which is the Multiply-Accumulate (MACC) operation.

Each iteration in Eq 4.4 is a MACC operation. By using the DSP48 slices in the MACC mode, and enabling of pipelining within the slice, the recursive sum can be computed at very high clock frequencies, thus enabling high sampling rates.

Of course, the drawback is that the DSP48 slices are available only in very limited numbers in an FPGA. Therefore, this resource needs to be utilized in a time-shared architecture. At the same time, any parallelism in the computations need to be fully exploited, so as to obtain maximum utilization of the resources.

(43)

Multiplier – DSP48 Block

16 Multiply-Accumulate (MACC) – DSP48 Block

2-stage Pipelining 32 32 16 16 Coeff_DataIn 16 RegIn_2 16 Input_In StateOut LOAD_DATA RegIn_1 16 16 RegOut_2 32 32 MACC_RST OUT_SEL IN_SEL 16 StateIn 16 RegOut_1 16 Param_In 32

Only for PE_1, the multiplier (red lines) is instantiated. Whereas, for all other PE_i, the register is instantiated.

16

Figure 4.2: State Calculation: Single Processing Element

4.3.1 Processing Element

The basic unit of the state calculation module is a Processing Element (PE). A model for a system of order N, will require N PEs. Figure 4.2 illustrates the internal architecture of a PE. Each P E_i corresponds to the state variable x_i. After every calculation iteration, the stateOut register contains the latest value of the state variable x_i,k.

Each PE encapsulates a single DSP48 Slice, configured as MACC with 2-stage pipelining. The first PE is an exception, since it contains an additional multiplier. This multiplier is needed to carry out the multiplication of the state variables with the time-varying parameters.

4.3.2 PE Integration

The Processing Elements are arranged in a cyclic shift register format, as illustrated in Figure 4.3. Essentially, there are two shift registers (composed of two independent registers within the PEs).

The system input (uk) and the time-varying parameters (zi) are fed in via

Parallel-In-Serial-Out shift registers. This approach can be replaced by multiplexers. However, slice register resources are more numerous in FPGAs, as compared, to slice LUT resources. Therefore, shift registers are more efficient. The combinatorial cloud is also reduced, as a consequence of this approach.

(44)

30 CHAPTER 4. STATE CALCULATION LOGIC

Block Memory (Coefficients) See Excel Sheet

EN Log2((R+2)N) BRAM_EN N*16 16 16 16 Counter x_1 x_3 x_N u_1 u_M 16 16 U_REG_EN 16 x_2 W IS H B O N E SL A V E IN TE R FA C E z_1 z_R 16 16 Z_REG_EN 16 16 R eg In _2 Input_In RegOut_2 Coeff_In St a te _P E_ 1 R eg In _1 RegOut_1 Param_In Sta te In StateOut 16 16 16 R eg In _2 Input_In RegOut_2 Coeff_In St a te _P E_ 2 R eg In _1 RegOut_1 Param_In St a te In StateOut 16 16 16 R eg In _2 Input_In RegOut_2 Coeff_In St a te _P E_ 3 R eg In _1 RegOut_1 Param_In St a te In StateOut 16 16 16 R eg In _2 Input_In RegOut_2 Coeff_In St a te _P E_ N R eg In _1 RegOut_1 Param_In St a te In StateOut 16 ADDR

Figure 4.3: Architecture of the State Calculation Module

The coefficients, as well as the initial state values are stored in a Block RAM. The block RAM is a dual-port memory, and is interfaced to the LM32 CPU via the Wishbone Slave interface. This allows the configuration of model parameters, in the shape of matrix coefficients and initial conditions, via the software. The sampling rate of the calculation logic is software configurable, through control registers (also mapped on WB interface).

Block RAM Design

The block memory primitives are custom assembled, in such a way so as to facilitate the availability of coefficients for all parallel MACC operations in one clock cycle. This calls for a rather large bus width. The width of the Data bus is N times the word-length for the variables.

The first row contains the initial state values. Through a SOFT_RST signal from the SW, all state variables can be initialized to their initial value. The rest of the rows of the block RAM contain the coefficients, as given by Eq 4.8, with each column in the matrix corresponding to a row in the BRAM.

The last few rows must contain zeros, since the accumulator is still enabled in those iterations. Table 4.1 illustrates the memory layout for an example system with N = 4; M = 2; P = 2 and a word-length of 16-bits.

(45)

Address

Data Bus _{Input to P E}₁ MACC Input to P E4 MACC 63:48 47:32 31:16 15:0 0x00 x1,0 x2,0 x3,0 x4,0 X X 0x01 a01,4 a02,1 a03,2 a04,3 x4 x3 0x02 a01,3 a02,4 a03,1 a04,2 x3 x2 0x03 a01,2 a02,3 a03,4 a04,1 x2 x1 0x04 a01,1 a02,2 a03,3 a04,4 x1 x4 0x05 a11,4 a12,1 a13,2 a14,3 z1x4 z1x3 0x06 a11,3 a12,4 a13,1 a14,2 z1x3 z1x2 0x07 a11,2 a12,3 a13,4 a14,1 z1x2 z1x1 0x08 a11,1 a12,2 a13,3 a14,4 z1x1 z1x4 0x05 a21,4 a22,1 a23,2 a24,3 z2x4 z2x3 0x06 a21,3 a22,4 a23,1 a24,2 z2x3 z2x2 0x07 a21,2 a22,3 a23,4 a24,1 z2x2 z2x1 0x08 a21,1 a22,2 a23,3 a24,4 z2x1 z2x4 0x09 b1,1 b2,1 b3,1 b4,1 u1 u1 0x0A b1,2 b2,2 b3,2 b4,2 u2 u2 0x0B 0 0 0 0 X X 0x0C 1 1 1 1 X X 0x0D 0 0 0 0 X X 0x0E 0 0 0 0 X X

Table 4.1: Example Layout of the Coefficient Block RAM

Owing to the requirement of a parallel access to multiple data words from the block RAM, 100% utilization of BRAM resources is not possible. The number of 9Kb BRAM primitives (RAMB8BWER) is dependent on the order of the system, and is given as dN/4e. The utilization improves, if the number of time-varying states is higher. Therefore, it is worthwhile to assign slower states as time-varying parameters.

(46)

32 CHAPTER 4. STATE CALCULATION LOGIC MACC_SHIFT=0 MACC_RST=1 OUT_SEL=b"01" IN_SEL=b"00" STATE_INIT / BRAM_ADDR=0x00 OUT_SEL=b"11" IN_SEL=b"10" MACC_RST_cond=1 BRAM_ADDR++ MACC_SHIFT=0 INIT_UPDATE UPDATE_TRIG / BRAM_ADDR=0x01,ST_SEQ_IN=1,P_SEQ_IN=1 SOFT_RST OUT_SEL=b"11" MACC_SHIFT=0 BRAM_ADDR++ IN_SEL=b"11" P_SEQ<<1 NEXT_PARAM [!ST_SEQ_OUT]

[ST_SEQ_OUT & !P_SEQ_OUT] / ST_SEQ_IN=1,Z_REG_EN=1

MACC_SHIFT=0 IN_SEL=b"10" BRAM_ADDR++ OUT_SEL=b"11" STATE_FEEDBACK [!ST_SEQ_OUT]

[ST_SEQ_OUT & !P_SEQ_OUT] / ST_SEQ_IN=1,Z_REG_EN=1

[!ST_SEQ_OUT] IN_SEL=b"01" MACC_SHIFT=0 BRAM_ADDR++ OUT_SEL=b"00" U_REG_EN=1 INPUT_UPDATE

[ST_SEQ_OUT & P_SEQ_OUT] [ST_SEQ_OUT & P_SEQ_OUT]

MACC_SHIFT=0 IN_SEL=b"00" OUT_SEL=b"10" IDLE IN_SEL=b"00" OUT_SEL=b"00" BRAM_ADDR++ ADD_PREV [ADDR_LIM]

[ST_SEQ_OUT & P_SEQ_OUT]

OUT_SEL=b"00" IN_SEL=b"00" MACC_SHIFT=1 BRAM_ADDR++ UPDATE_DONE UPDATE_TRIG / ST_SEQ_IN=1,BRAM_ADDR=0x01,P_SEQ_IN=1

[ST_SEQ_OUT & !P_SEQ_OUT] / Z_REG_EN=1,ST_SEQ_IN=1

Figure 4.4: Finite State Machine for State Calculation Module

The source code for the custom block RAM is given in the Appendix (C.1).

4.3.3 FSM Design

A finite state machine (FSM) controls the routing of data through the processing elements and generates the appropriate addresses for the coefficient Block RAM. Figure 4.4 illustrates the state diagram.