• No results found

Parallel Hardware for Sampling Based Nonlinear Filters in FPGAs

N/A
N/A
Protected

Academic year: 2021

Share "Parallel Hardware for Sampling Based Nonlinear Filters in FPGAs"

Copied!
89
0
0

Loading.... (view fulltext now)

Full text

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Parallel Hardware for Sampling Based Nonlinear

Filters in FPGAs

Examensarbete utfört i Elektroniksystem vid Tekniska högskolan vid Linköpings universitet

av

Rakesh Kota Rajasekhar

LiTH-ISY-EX--14/4821--SE

Linköping 2014

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)
(3)

Parallel Hardware for Sampling Based Nonlinear

Filters in FPGAs

Examensarbete utfört i Elektroniksystem

vid Tekniska högskolan i Linköping

av

Rakesh Kota Rajasekhar

LiTH-ISY-EX--14/4821--SE

Handledare: Syed Asad Alam

isy, Linköpings universitet

Examinator: Oscar Gustafsson isy, Linköpings universitet

(4)
(5)

Avdelning, Institution Division, Department

Division of Electronics Systems Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

Datum Date 2014-12-22 Språk Language  Svenska/Swedish  Engelska/English   Rapporttyp Report category  Licentiatavhandling  Examensarbete  C-uppsats  D-uppsats  Övrig rapport  

URL för elektronisk version

http://www.es.isy.liu.se http://www.es.isy.liu.se ISBNISRN LiTH-ISY-EX--14/4821--SE

Serietitel och serienummer Title of series, numbering

ISSN

Titel

Title Parallel Hardware for Sampling Based Nonlinear Filters in FPGAs

Författare Author

Rakesh Kota Rajasekhar

Sammanfattning Abstract

Particle filters are a class of sequential Monte-Carlo methods which are used com-monly when estimating various unknowns of the time-varying signals presented in real time, especially when dealing with nonlinearity and non-Gaussianity in BOT applications. This thesis work is designed to perform one such estimate involving tracking a person using the road information available from an IR surveillance video. In this thesis, a parallel custom hardware is implemented in Altera cyclone IV E FPGA device utilizing SIRF type of particle filter. This implementation has accounted how the algorithmic aspects of this sampling based filter relate to possibilities and constraints in a hardware implementation. Using 100MHz clock frequency, the synthesised hardware design can process almost 50 Mparticles/s. Thus, this implementation has resulted in tracking the target, which is defined by a 5-dimensional state variable, using the noisy measurements available from the sensor.

Nyckelord

Keywords Particle filters, Hardware architectures, Bearings-only tracking (BOT), Resam-pling, Sampling Importance Resampling Filter(SIRF)

(6)
(7)

Abstract

Particle filters are a class of sequential Monte-Carlo methods which are used com-monly when estimating various unknowns of the time-varying signals presented in real time, especially when dealing with nonlinearity and non-Gaussianity in BOT applications. This thesis work is designed to perform one such estimate involving tracking a person using the road information available from an IR surveillance video. In this thesis, a parallel custom hardware is implemented in Altera cyclone IV E FPGA device utilizing SIRF type of particle filter. This implementation has accounted how the algorithmic aspects of this sampling based filter relate to possibilities and constraints in a hardware implementation. Using 100MHz clock frequency, the synthesised hardware design can process almost 50 Mparticles/s. Thus, this implementation has resulted in tracking the target, which is defined by a 5-dimensional state variable, using the noisy measurements available from the sensor.

(8)
(9)

Acknowledgments

I am extremely thankful to my examiner Dr.Oscar Gustafsson for giving me an opportunity and mentoring me to complete this thesis in the right direction. I also would like to convey my gratitude for all the support he provided me throughout this academic journey. I am very much honoured to grow professionally under his guidance.

I sincerely thank my supervisor Syed Asad Alam for his valuable supervision and for his helping hand. Whenever I got stuck with a difficult challenge in my thesis work, he made it to look easier which helped me to proceed further.

I would like to thank all the professors and Ph.D. students at the Department of Electrical Engineering, Linköping University who indirectly supported me dur-ing my thesis work. I also like to thank my thesis opponent for his patience and numerous suggestions on the thesis.

I would like to thank my parents, family and friends who shared their love and support during my hard times which helped me to succeed during my master’s studies.

(10)
(11)

Contents

1 Introduction 7

1.1 Background . . . 7

1.2 Motivation . . . 11

1.3 Intended Readers . . . 12

1.4 Structure of the Report . . . 12

2 Initial Analysis of the Requirement 13 2.1 Introduction . . . 13

2.2 Characteristics of the Matlab Model . . . 13

2.2.1 Overview . . . 13

2.2.2 Perspective Camera . . . 14

2.2.3 Filter Initialisation Step . . . 15

2.2.4 Sample step . . . 16

2.2.5 Importance weight step . . . 17

2.2.6 Resample step . . . 18

2.2.7 Point Estimate . . . 18

2.3 Analysis on Data Representation . . . 19

2.4 Characteristics of the Output Plot . . . 22

2.5 Summary of the Requirements . . . 24

3 Hardware Architecture for SIRF 25 3.1 Introduction . . . 25

3.2 Quantization and Overflow Handling . . . 25

3.3 Conceptual Schedule . . . 27

3.4 Register and Counter . . . 28

3.5 Time update . . . 29

3.6 Measurement update . . . 34

3.7 Random number generation . . . 36

3.8 Scaling of Weights and Random Values . . . 40

3.9 Resampling . . . 41

3.10 Control Mechanism . . . 46

3.10.1 Resampling Status . . . 46

3.10.2 Memory Select Logic and Bus Control Logic . . . 47

3.10.3 Memory Controller Block . . . 49 ix

(12)

x Contents

3.10.4 Hardware Logic Controller . . . 51

3.11 Execution Time . . . 54 4 Testbench Setup 55 4.1 Introduction . . . 55 4.2 Testbench Model . . . 55 4.2.1 Sensor Information . . . 55 4.2.2 Perspective Camera . . . 56 4.2.3 Counters . . . 57 4.2.4 Controller . . . 57

4.2.5 Top Level of Testbench . . . 59

5 Design Methodology and Results 61 5.1 Introduction . . . 61 5.2 Simulation Results . . . 62 5.3 Synthesis Results . . . 69 6 Conclusion 71 6.1 Summary . . . 71 6.2 Future Work . . . 71 Bibliography 73

(13)

List of Figures

1.1 Functional Block Diagram of the SIRF. . . 10

1.2 Resampling using CDFs. . . 11

2.1 Output plot of the Matlab model . . . 23

2.2 Initialisation step from Matlab model . . . 23

3.1 Rounding and Truncation performed on 18-bit data to obtain 16-bit data. . . 26

3.2 Timing diagram for SIRF. . . 27

3.3 Two different counter components implemented in the design. . . . 29

3.4 Functional Block Diagram of the Time Update Step. . . 31

3.5 Functional Block Diagram Showing the Communication of the Time Update Block in the Top Level Architecture. . . 31

3.6 Internal Architecture of Process Noise Generator. . . 32

3.7 Timing diagram of Time Update Step. . . 33

3.8 Functional Block Diagram of Measurement Update Step. . . 34

3.9 Functional Block Diagram Showing Communication of Measure-ment Update Step in the Top level Architecture. . . 36

3.10 Timing Diagram of Measurement Update Step. . . 37

3.11 Structure of 8-bit Galois type LFSR. . . 38

3.12 Functional Block Diagram Showing Communication of Random num-ber generation Block in the Top level Architecture. . . 39

3.13 Timing Diagram of Random Number Generation Step. . . 39

3.14 Functional Block Diagram of Scaling Random Values and Weights. 41 3.15 Functional Block Diagram of Resampling Step. . . 42

3.16 Timing Diagram Illustrating the Start Cycle of Resampling in Top Level Architecture. . . 43

3.17 Timing Diagram of Write Process in the Resampling Step. . . 44

3.18 Timing Diagram of Read Process in the Resampling Step. . . 45

3.19 Functional Block Diagram Generating Resampling Status Signal. . 46

3.20 Finite-State Machine Controlling the Select Signal for RS Status Signal Generation. . . 47

3.21 Timing of Resampling Status Signal. . . 47

3.22 Functional Block Diagram of Memory Select Logic and Bus Control Logic. . . 49

3.23 Functional Block Diagram of Memory Controller Block . . . 50

3.24 Functional Block Diagram of Hardware Logic Controller . . . 52

3.25 Finite-State Machine of Hardware Logic Controller . . . 52

3.26 Overview of Counters and HLC’s FSM in the Schedule. . . 54

4.1 Overview of Testbench Interface. . . 56

4.2 Finite-State machine of Testbench Controller. . . 58

4.3 Functional Block Diagram of Testbench Environment. . . 60

(14)

2 Contents

5.1 Top-Down Design Flow Methodology. . . 61

5.2 Estimated Position of the Target, in Frame ID Number : 1395 (left) and 1396 (right), in IR view from FPGA (red circle) and Matlab (green circle). . . 62

5.3 Results of Particle Filter Tracking a Person in IR Survelliance. . . 63

5.4 Point Estimate for State Vector x using 1000 particles. . . . 64

5.5 Point Estimate for State Vector y Using 1000 Particles. . . . 65

5.6 Point Estimate for State Vector z Using 1000 Particles. . . . 65

5.7 Point Estimate for State Vector vx Using 1000 Particles. . . 66

5.8 Point Estimate for State Vector vy Using 1000 Particles. . . 66

5.9 Compare Missing Measurements With Point Estimate for State Vec-tor x Using 1000 Particles. . . . 67

5.10 Point Estimate for State Vector x Using 2000 Particles. . . . 67

5.11 Point Estimate for State Vector y Using 2000 Particles. . . . 68

5.12 Point Estimate for State Vector vy Using 2000 Particles. . . 69

5.13 Maximum Deviation seen on the IR view Using 1000 Particles. . . 70

List of Tables

2.1 Simple Random Resampling Algorithm. . . 18

2.2 Word Length Estimation for Time Update Step. . . 19

2.3 Word Length Estimation for Process Noise. . . 20

2.4 Word Length Estimation for Perspective Camera. . . 20

2.5 Word Length Estimation for Measurement Update and Random Number Generation Step. . . 21

2.6 Word Length Estimation for Resampling Step. . . 21

3.1 Pseudocode of Saturation logic for 18-bit data. . . 27

3.2 Types of register components utilised in the design. . . 28

3.3 Types of counters instantiated in the design. . . 30

3.4 Designing LUT Using VHDL function to Evaluate Exponential of -2. 35 3.5 MCB Interpretation of Flip-Flop Outputs . . . 48

4.1 Word Length of Sensor Data. . . 56

5.1 Average Process Noise for State Vectors y and vy. . . . 68

5.2 Summary of Synthesis Report on Resource Usage. . . 70

A.1 Generics Description at Testbench Level. . . 75

A.2 Generics Description of Counters in Hardware Model. . . 76

A.3 Generics Description of MU in Hardware Model. . . 76

A.4 Generics Description of Random number Generation in Hardware Model. . . 76

A.5 Generics Description of Scaling unit in Hardware Model. . . 76

A.6 Generics Description of Resampling in Hardware Model. . . 76

(15)

Contents 3

A.8 Generics Description of Bus Routing in Hardware Model. . . 77 A.9 Generics Description of Control Mechanism in Hardware Model. . 77

(16)

4 Contents

Acronyms

BOT Bearings-Only Tracking BRAM Block Random Access Memory CDF Cumulative Distribution Function CSR Cummulative Sum of Random values CSW Cummulative Sum of Weights DSP Digital Signal Processing DSS Dynamic State Space EKF Extended Kalman Filter

FPGA Field-Programmable Gate Array FSM Finite-State Machine

GPU Graphics Processing Unit HDL Hardware Descriptive Language HLC Hardware Logic Controller ID Identification

IR Infrared

LFSR Linear Feedback Shift register LSB Least Significant Bit

LUT Look Up Table MAP Maximum a Posteriori MCB Memory Controller Block MMSE Minimum Mean Square Error MSB Most Significant Bit

MU Measuremnet Update NRE Non-Recurring Engineering RT Register Transfer

PSC Parallel Step Counter RRC Random value Read Counter

(17)

Contents 5

RSC Resampling Counter

SIRF Sampling Importance Resampling Filter SMC Sequential Monte Carlo

TU Time Update

(18)
(19)

Chapter 1

Introduction

1.1

Background

The history behind solving the state estimation problems existed from the period of estimating the positions of the planets and comets using least-squares by Gauss and Legendre in 17thcentury. The next step in estimation problem was carried out by Wiener and Kolmogorov in early 1940s to extract desired signal from a mixture of noise settings. Their work solves the problem by making a restrictive assumptions in accessing to infinite content of data and all the involved signals are described as stationary stochastic process. In the Year 1960 a major break through is made by Kalman which uses state space-theory without accounting to the assumptions made on stationary signals and the access to infinite amount of data. Kalman filter is over 54 years old and is still one of the most important data fusion algorithms used till today in many satellite navigation systems and smart phones etc. Between 1960s and 1980s many suggestions were provided to improve the Kalman filtering theory to solve approximations for nonlinear model with non-gaussian noise. Like for example the origin of EKF, which approximates the nonlinear model by a linear model and then utilise Kalman filter to obtain an optimal solution. The results from this method give a lot of divergence, biases and lack of robustness in some cases [4] and also the nonlinearity property is modified and causes to obtain an optimal solution to a wrong problem. So a better way to solve this is to keep the nonlinearity and find an approximate solution. This is where the sequential Monte Carlo methods come into play, which run simulations several times to obtain the distribution of an unknown probabilistic entity. This method provides an approximate solution to the correct problem under consideration rather than giving an optimal solution to an incorrect problem [1]. This method is very flexible and attractive approach to compute posterior distributions. An important class of SMC methods is the particle filters introduced by Gordon et al. in the year 1993. Particle filters are employed in scenarios where estimates are need to be ob-tained from the nonlinear systems with non-Gaussian noise in which the signals are described by the system of equations formulated as DSS model equations. The DSS model consist of two parts of equations, one part describes the evolution of

(20)

8 Introduction

the state of system {X(t); t  N} with time, known as the state equation and the other describes the observations{Y (t); t  N } as a function of state, known as the observation or measurement equation. Mathematically the DSS model is given by:

X(t) = ft(X(t − 1), w(t)) (1.1)

Y(t) = ht(X(t), e(t)) (1.2)

where ftand ht are nonlinear functions describing the evolution of state and the

measurements over time t. w(t) and e(t) are random noise vectors. The state equation in (1.1) along with the initial distribution p(X0), which describes the prior knowledge of the hidden Markov state process at initial time instant, are incorporated in the distribution p(X(t)|X(t-1)). From the Bayesian point of view all information of the state process are incorporated in the posterior distribution p(X(0:t)|Y(1:t)). The particle filters approximates these posterior as random mea-sure consisting of particles X(t) and its associated weights W (t). These weights are normalized such that PN

i=1W(t)

(i)= 1, where N is the total number of par-ticles used and i denotes the ith particle. Generally, the random measure which approximates the posterior is given by:

p(X(0 : t)|Y (1 : t)) ≈ N

X

i=1

δ(X(0 : t) − X(0 : t)(i))W (t)(i) (1.3)

where δ denotes the discrete distribution consisting of states X(i)with correspond-ing probabilities W(i) using a continous probability density function. Using the approximation in (1.3), an estimate E(g(X(0 : t))), where g is any function of state X(0 : t), can be computed as:

ˆ E(g(X(0 : t))) = N X i=1 g(X(0 : t)(i))W (t)(i) (1.4)

In reality, particles can not be drawn from the posterior, especially for nontriv-ial problems. So an alternating density known as importance function or proposal density is used to draw samples. This technique is called Importance Sampling. This function is represented as π(X) and it can support the posterior. The unnor-malized importance weight associated with each particle drawn from the impor-tance function is given by:

W∗(i)= p(X

(i))

π(X(i)) (1.5)

In order to recursively update the importance weights, the proposal density is factorised and then using the three important probabilistic features such as Bayes theorem, Markovian assumption of state evolution and the conditional indepen-dence of observations given the state [3], the recursive weight update equation is calculated as:

(21)

1.1 Background 9

W(t)(i)= W (t − 1)(i) p(Y (t)|X(t))p(X(t)|X(t − 1))

π(X(t)(i)|X(0 : t − 1)(i), Y(1 : t)) (1.6) Now the importance weights are normalised. It is important that the weights are normalised because of the fact that the normalising factor in posterior density is unknown. This implies that the importance weights are only known up to this normalising factor and this is resolved by normalising the importance weights. The normalised importance weight is given by:

W(t)(i)= W

(t)(i) PN

j=1W(t)(j)

(1.7)

The choice of importance function plays a crucial role in the performance of the particle filter. Usually the prior density given by p(X(t)(i)|X(t − 1)(i)) is widely choosen as the importance function because it is easy to sample from and it can greatly reduce the complexity in importance weight equation. Even though the prior density is not an optimal solution, still it is a suitable choice from a hardware implementation point of view. By substituting the prior as the importance function in (1.6), the importance weight equation is modified as:

W(t)(i)= W (t − 1)(i)p(Y (t)|X(t)) (1.8)

Once the mathematical background of calculating the importance weights has been explored, it is now time to look at the algorithmic steps involved in the particle filter. A standard particle filter performs three essential steps:

• Generation of particles, also known as time update step.

• Computation of particle weights, also known as measurement update step. • Resampling

A filter that performs all these three steps is called SIRF. SIRF is a recursive algorithm which propagates the random measure composed of particles and its associated weights from the importance function in order to calculate the estimates of the unknown parameters from the observed data. A functional block diagram of SIRF depicting these steps is shown in the Fig. 1.1.

The particle filter is initialized by drawing the samples from the prior density function p(X0) to generate particles X(t). During this step process noise is added up as an unknown stochastic process to illustrate the disturbances acting on the system. Using the input observations, the importance weights are updated using the likelihood function p(Y(t)|X(t)) which describes how likely it was to obtain the measurements or observations given that the information is available in the particle. Thus the normalized weights W(t) as shown in (1.7) are calculated at the output of measurement update step. Over a period of time the weights of the particle degenerate, implying that their variance increases to a serious degree. In order to avoid this weight degeneracy problem resampling step is incorporated after the measurement update step. The resampling step will return new particles

(22)

10 Introduction

Time update Measurement Update Resampling

Estimator

X(t) W(t) X(t), W(t)

Y(t)

P SIRF

Figure 1.1: Functional Block Diagram of the SIRF.

˜X(t) by rejecting the particles associated with lower weights and replicating par-ticles with higher weights. Several resampling schemes have been discussed in the previous works [9] [2]. The different resampling schemes differ in the probability of drawing X(t) and the way of implementing the resampling function. In case of systematic resampling, the replacement of old set of particles {X(t)(i)}N

i=1 by the

new set of particles { ˜X(t)(i)}N

i=1 is obtained using the probability given by: P r( ˜X(t)(i)= X(t)(j)) = W (t)(j) (1.9)

After resampling step the normalised weights are set to 1/N representing the average weights ˜W(t). The resampled particles and the time updated particles are used to calculate the estimates P. The estimator block performs the desired estimate needed for a given problem. Usually when solving for BOT problem, the bearings or angles between the sensor and the moving target at fixed time intervals are provided as measurements to the particle filter. Now the task of the filter is find the unknown states of the target such as position and velocity of the target from the sensor. The methodology to calculate the estimate of these unknown states are called point estimates. Minimum variance, MMSE, MAP are some of the ways to calculate point estimates which are discussed in [1].

Only systematic resampling has been discussed in this brief as it has mini-mal variance [2] and it is quite common to implement from the hardware design perspective. Generally the time taken by resampling cannot be determined be-cause the number of draws are random and cannot be predicted due to varying random rejections. This indicates that a random number generation is used for this step which is commonly called resampling function U(i). The systematic re-sampling is performed by generating sorted uniform random numbers such that

U(i)= ((i − 1) + ˜U)/N, where ˜U ∼ U (0,1) and along with it CSW of the particles are computed. Both these values are drawn sequentially and compared. If CSW is greater than or equal to the updated random number then the particle is repli-cated and inturn the resampling function updates a new random number. If CSW is less than the random number then the particle is rejected and the next CSW is updated. This process is repeated till the set of particles are replicated. This process is illustrated in the Fig.1.2 using CDFs for 5 particles {X(0), X(1), X(2), X(3) and X(4)}.

(23)

1.2 Motivation 11 u(0) u(1) u(2) u(3) u(4)

1

0

csw

N

0 1 2 3 4 csw(0) csw(1) csw (2) csw(3) csw(4)

Figure 1.2: Resampling using CDFs.

The x-axis represents the particle index from 0 to 4 and y-axis represents the resampling function being systematically updated and compared with CSW. Initially CSW(0) and U(0) are drawn simultaneously and compared. Since the CSW(0) is greater than the random number this particle X(0) is retained and the next random number U(1)is obtained from the resampling function. CSW(0)is now compared with the updated random number U(1)and it is clear from the Fig. 1.2 that CSW(0)is greater than this random number and so particle X(0) is replicated. In this similar fashion, when CSW(1) is compared with U(3), the random number is greater during this comparison, thus particle X(1)is rejected and the next CSW is drawn or updated. Once all the random numbers are compared with CSW, the resampled particles are obtained indicating the finish of resampling. Thus the resampled set of particles obtained from the original set of particles in this example are {X(0), X(0), X(0), X(3)and X(3)}. It can been seen that the 1st particle X(0) is replicated three times and 4th particle is replicated two times and the remaining particles are rejected due to their association with lower weights.

1.2

Motivation

Sampling based filters are widely used in positioning, navigation and target track-ing applications [5]. The conventional software implementations lack enough com-putational power. An efficient hardware design focussed on processing power could elaborate the potential application areas of sampling based filters. Kalman filter is a well known method among this group of sampling based filter to provide es-timate of a state of a system described by linear systems with Gaussian noise. However there are many scenarios where the state of a system can’t be well de-scribed using linear systems with Gaussian noise. In situation like this, a

(24)

non-12 Introduction

trivial class of sampling based filter called the sequential Monte-Carlo methods, also known as Particle filters can be used. Several resampling algorithms exist in literatures [2] [9], which are used to describe the particle filter. So this work could possibly evaluate the hardware constraints faced when mapping one such algorithm into the hardware design.

A scenario like this is emulated using IR surveillance tracking to evaluate the possibility of a particle filter for this application. Therefore a parallel hardware de-sign is needed to accomplish this task. The idea of a parallel hardware dede-sign suits well with a FPGA as choice. FPGAs are reconfigurable, no NRE costs, new bit streams can be uploaded remotely and have lesser design time. Moreover FPGAs are very suitable for pipelined designs as they consist of numerous registers which can be utilised for this purpose. In addition to this, this work can nurture how the algorithmic aspects of the particle filter can relate to possibilities and constraints in a hardware implementation. Thus this work proposes an implementation of parallel custom hardware in FPGA.

1.3

Intended Readers

The reader of this document is someone who is interested in the hardware design of particle filters without going though the in-depth analytical solutions of the DSS model. In addition to this, the reader is also expected to have some mathematical background involved in probability and statistical concepts.

1.4

Structure of the Report

This report is devised as follows:

• Chapter 2 discusses the analysis of the requirements provided for this work. This chapter documents all the necessary details about the state of the sys-tem inferred from the Matlab model to start up the hardware design and how the performance metrics of the particle filter is calculated.

• Chapter 3 explains the architecture designed to implement the particle filter in FPGA. This chapter presents which type of digital components are used to design the architecture and how the inferred details from chapter 2 relate to the possibilities and constraints in a hardware design.

• Chapter 4 describes the testbench designed using various components to illustrate the given BOT problem. This chapter shows how the given problem is depicted in the testbench.

• Chapter 5 presents the results of the designed hardware from chapter 3 and the further analysis needed to check if the obtained results meet the purpose of this work.

• Chapter 6 marks the conclusion and states the future work of this thesis work.

(25)

Chapter 2

Initial Analysis of the

Requirement

2.1

Introduction

The purpose of this chapter is to collect and analyse all the information presented in the requirement model. The developed requirement model was provided for the thesis work in order to analyse the possibilities and constraints involved dur-ing the design of hardware model in FPGA. This analysis plays a vital role in understanding the important features of the matlab model provided for the work and to modify the model in the later stages of the design and verification process. Moreover, this analysis also helps in deciding some of the important characteristics of the data utilised in the hardware design and understanding the various algo-rithmic aspects of the requirement. Section 2.2 decribes the characteristics of the algorithm utilised in the Matlab model. Section 2.3 presents the details of data representation and word length taken into account for the hardware design from various steps of the resampling algorithm given in the Matlab model. Section 2.4 describes the features and the significance of understanding the Matlab model’s output plot.

2.2

Characteristics of the Matlab Model

2.2.1

Overview

The Algorithm provided for the hardware design which describes the functionality of a particle filter is an object oriented programming model. This model represents the most commonly used type filter called SIRF, consisting of the basic steps such as time update, measurement update and resampling. The functionality of these steps are explained in detail in the chapter 3. This model employs the total number of particles N as 1000, the number of dimensions of the state Ns as 5 and simulates for about 1824 iterations utilizing the real time camera calibration values

(26)

14 Initial Analysis of the Requirement

and sensor measurement values in offline mode. This camera calibration process is carried out independent of the filter initialisation step and availability of the sensor measurements to the filter. The IR survelliance video taken for this thesis work is stored as image frames and these frames are read by this algorithm for the Matlab model simulation. The offline sensor measurement data are assigned with an ID number corresponding to these frames. The filter gets initialised when the iteration number matches the identification number for the first time, and thereafter, whenever the iteration number is same as the identification number, the algorithm considers that there is a sensor measurement available in that particular frame or iteration. The algorithm is executed in the following sequence of steps: time update, measurement update and resampling. If there is no measurement available, then the measurement update and resampling steps are skipped and the execution moves to next iteration. However, chapter 3 has detailed how this algorithmic aspect of skipping these steps to start the next iteration is handled from a hardware design perspective in order to receive synchronous samples in every iteration.

2.2.2

Perspective Camera

The perspective camera of the Matlab model consists of two transformation func-tions, for the sake of reader’s convenience we name these functions as “image-toworld” and “worldtoimage”. The output of “image“image-toworld” function is utilized for the initialisation step and the reason why this transformation function is not rewritten in HDL during the testbench development is described in the Sec. 2.2.3. The other transformation function, “worldtoimage”, describes how the time up-dated particles are seen through the camera over time t and the output of this function will be utilized in the measurement update step to calculate the weight of a particle as shown in (2.14) and Sec. 3.6. So it is clear that “worldtoimage” function is of our concern to design the perspective camera component and it is described by the following equations:

pp= (R × pw) + T pp1 =  pp1/pp3 pp2/pp3  (2.1) r2= pp121+ pp1 2 2 (2.2) dx= (2 × Kc × pp11× pp12) + (Kc(r2+ 2 × pp121)) + (Kc(r2+ 2 × pp12)) + (2 × Kc(pp11× pp12)) (2.3) d= 1 + (Kc × r2) + (Kc × r22) + (Kc × r32) (2.4) pp2 = (d × pp1) + dx (2.5) pp31= (F c11× pp21) + (alphac × pp21) + cc11 pp32= (F c21× pp22) + cc21 (2.6) where R is a (3 × 3) matrix representing one of the calibration parameters,

(27)

2.2 Characteristics of the Matlab Model 15

pw is a (3 × N) matrix representing the probable position of target, T is (3 × 1) matrix representing one of the calibration parameters, pp is an intermediate variable storing (3 × N) matrix values, pp1, pp2 and pp3are 1st, 2ndand 3rd rows of pp matrix respectively,

pp1, pp2, pp3 and dx are intermediate variables storing (2 × N) matrix

values,

pp11 and pp12 represents the 1st and 2nd rows of pp1 matrix respectively,

r2and d are intermediate variables storing (1 × N) matrix values,

pp21 and pp22 are 1st and 2nd rows of pp2 matrix respectively,

alphacand Kc are calibration values which are given by (1 × 5) zero matrix

and zero respectively, and

Fc and cc are calibration parameters and are given by 937 937  and 160 120  respectively.

The nature of the calibration parameters utilised in above equations are beyond the scope of this thesis work and so we use these matrix values for the hardware design as given in the requirement.

2.2.3

Filter Initialisation Step

Initialisation step is the starting point of the time update step which locates the initial position of the target using the perspective camera model designed to pro-vide the normalised initial coordinates, i.e., x, y and z dimensions of the target seen through the camera, to the filter. The perspective camera accomplishes this by transforming the featured location on the image, given by the sensor measure-ment, to a normalised featured location. The filter can now proceed with the remaining iterations without the need of this transformation as the particle values representing the target is already normalised. So for this reason that this trans-formation is being only utilised during the initialisation step and also for the sake of simplicity, the output values of this transformation function have been used as the initialisation values for the hardware design instead of implementing this transformation function in the camera component which is one of the testbench component communicating with the particle filter. The initial position coordinate values are given by:

x= 10.1773 y= −107.3503 z= 0.9000

(2.7)

The plot of the initialisation step is described in the Sec. 2.4. However the actual initialised locations of the target will differ in both hardware and Matlab models, since the process noise will be added up to the positional vectors given in (2.7). This difference in the initialised location from both models is shown in the Fig. 5.2.

(28)

16 Initial Analysis of the Requirement

2.2.4

Sample step

In this step we infer the state model equation, which represents the prior knowledge about the motion of the target, from the sample step or time update step provided in the algorithm and it is described by the following equation:

X(t) = F X(t − 1) + w(t) (2.8)

where X(t) is the state of the system at time t, F is the state transition matrix and w(t) is the process noise at time t which is considered to be a white Gaussian random variable with covariance matrix Q. The state transition matrix and the covariance matrix are defined as:

F =       1 0 0 C1 0 0 1 0 0 C1 0 0 1 0 0 0 0 0 C2 0 0 0 0 0 C2       (2.9) where C1 is 1 a 1 − e

(−a∗T ), C2 is e(−a∗T ) and

Q= 0.06 ×        T3 3 0 0 T2 2 0 0 T3 3 0 0 T2 2 0 0 0.001 0 0 T2 2 0 0 T 0 0 T2 2 0 0 T        (2.10)

In the above (2.9) and (2.10), a is 0.05 which denotes the power spectral density of process noise and T is 0.04 which denotes the sample period. The state of the system, also known as state variable X(t) in the (2.8) consist of five dimensional state vectors and these state vectors are represented as x(t), y(t), z(t), vx(t) and vy(t), where x(t), y(t) and z(t) denotes the position of the target in x, y and z

dimensions at time t respectively. vx(t) and vy(t) denotes the velocities in x and y

dimensions at time t respectively. Using this information about the state vectors and along with substituting (2.9) in (2.8), we derive the DSS model equations as follows: x(t) = x(t − 1) + vx(t − 1)C1 + w1(t), y(t) = y(t − 1) + vy(t − 1)C1 + w2(t), z(t) = z(t − 1) + w3(t), vx(t) = vx(t − 1)C2+ w4(t), and vy(t) = vy(t − 1)C2+ w5(t) (2.11)

where w1, w2, w3, w4 and w5 are process noise values generated for the state vectors. Here C1 and C2 are similar to the one described in (2.9). These derived model equations are used for the hardware design as described in the Sec. 3.5.The process noise is computed by calculating the square-root of the process noise co-variance matrix. The algorithm uses two matrices to represent the square-root of

(29)

2.2 Characteristics of the Matlab Model 17

the process noise covariance matrix, one being P0 for the initialization step and the other being Q0 for the remaining iterations. These matrices after calculating the square root are given by:

p P0=       3 0 0 0 0 0 3 0 0 0 0 0 0.01 0 0 0 0 0 1 0 0 0 0 0 1       (2.12) and p Q0=      

5.85e−04 3.74e−18 4.07e−18 9.68e−04 7.30e−20 −3.74e−18 5.85e−04 8.00e−22 5.48e−18 9.68e−04 4.07e−18 8.00e−22 7.75e−03 4.58e−18 1.60e−23 9.68e−04 5.48e−18 4.58e−18 4.90e−02 2.81e−20 7.30e−20 9.68e−04 1.60e−23 2.81e−20 4.90e−02

      (2.13) Thus we incorporate these matrices values in the hardware design for generating the process noise. However some of the values in the matrix Q0 is approximated due to word length concerns in the hardware design which is described in the Sec. 3.5.

2.2.5

Importance weight step

As a part of the next step in the algorithm, we infer the details provided in the measurement update step. Without getting into the detailed description of this step, which is described in Sec. 3.6, we analyse the equations that compute the importance weights using the likelihood function and it is given by:

W(t) = W (t − 1) × likelihood((Y (t) − ˆY(t)), e(t)) (2.14)

where the likelihood function is formulated as:

likelihood= p 1 (2π)2∗ |e(t)|× e (−0.5(dx21 e1(t)+ dx22 e2(t))) (2.15)

where dx1 and dx2are the values obtained from the difference between the sensor measurement (Y ) and the target actually observed through the camera ( ˆY). e1(t)

and e2(t) are the vectors of measurement noise e(t). The importance weights obtained from (2.14) are normalised and it is given by:

W(t)(i)= W

(t)(i) PN

j=1W(t)(j)

(2.16)

where i = 1,...,N. In order to avoid this division operator from (2.16) in the hardware implementation, we followed a scaling method as described in Sec. 3.8.

(30)

18 Initial Analysis of the Requirement N = numel(W); qs = cumsum(W); qs(end)=2; uu = cumprod(rand(1,N).ˆ(1./(N:-1:1))); ut = fliplr(uu); kk = 1; ii = zeros(1, N); forp = 1:N while(qs(kk) < ut(p)) kk = kk + 1; end ii(p) = kk; end X = X (:,ii); W = repmat(1/N,1,N);

Table 2.1: Simple Random Resampling Algorithm.

2.2.6

Resample step

The final step in this algorithm is the resampling and the resampling scheme used in the Matlab model is called simple random resampling [1] [2]. The resampling algorithm implemented in the Matlab model is shown in Table 2.1.

A uniformly distributed and sorted random numbers U (0,1) are generated us-ing this scheme and are compared with the cumulative sum of the normalised importance weights in this step in order to obtain the resampled particles. In this thesis work we have used similar to systematic resampling scheme with an exception that the random numbers generated are unsorted, which is a different approach compared to the one stated in the Matlab model and the reason behind this approach is described in detail in the Sec. 3.7 and 3.8. Resampling is executed in every iteration except when measurements are not available. After the resam-pling step, the weights are set to (1/N) and the resampled particles are obtained, thus indicating the start of the new iteration.

2.2.7

Point Estimate

A methodology to measure the performance of a sampling based filter estimate is called Point estimate [1]. Various ways of calculating the point estimate have been shown in the previous research works [1] [3] [4]. The Matlab model computes the performance metrics based on the weight of the particle and the estimate of covariance of ˆX(t) to calculate the point estimate, where ˆX(t) denotes the weighted

particles given by:

ˆ X(t) = N X i=0 (X(i)(t) × W(i)(t)) (2.17)

(31)

2.3 Analysis on Data Representation 19

Data description Representation Integer Bits Fractional BitsWord Length Total Numberof Bits

state vector x signed 6 12 18

state vector y signed 9 9 18

state vector z signed 2 16 18

state vector vx signed 4 14 18

state vector vy signed 3 15 18

Time update

memory data signed 24 66 90

Time update

memory address unsigned 16 - 16

Table 2.2: Word Length Estimation for Time Update Step.

Using the equation2.17, the point estimate P is computed as:

P =

N

X

i=0

(W(i)(t) × E{(X(i)(t) − ˆX(t))(X(i)(t) − ˆX(t))T}) (2.18)

Some important characteristics are inferred from the analysis of point esti-mate calculation in this model. Firstly, the ˆX(t) in (2.17) is used to calculate the

current estimated position of the target in which X(i)(t) denotes the resampled particles. Secondly, while calculating the point estimate P, X(i)(t) denotes the time updated particles and the point estimates are not calculated for the initial-isation step. Thirdly, while calculating the point estimate, W(i)(t) in (2.17) and (2.18) represents either the normalized importance weights or (1/N) in case of unavailability of the sensor measurement.

2.3

Analysis on Data Representation

After analysing the equations described in the Sec. 2.2, it is very prudent to decide the numerical properties of the data to be involved in the hardware computations. Numerical properties such as the type of number representation and word length of the data are taken into consideration. These properties have a greater influence on the precision of the desired result, power and resource utilization of the hardware design. The hardware design described in chapter 3 are implemented considering all the data in fixed-point, two’s complement number system in order to handle arithmetic computations involving negative fractional binary numbers. Tables 2.2, 2.3, 2.4, 2.5 and 2.6 present the details of these numerical properties from the various steps of the algorithm, after the analysis from the Matlab model.

As we can recall from the Sec. 2.2.1, the dimension of the target Nsis 5 and

thus the word length for each state vector as shown in the Table 2.2 is determined to accomodate the precision in all required dimensions. From the Table 2.3, it is clear that the process noise is generated for each state vectors in order to achieve effective randomness in each dimension. R11, R12, ..., R33 presented in table 2.4

(32)

20 Initial Analysis of the Requirement

Data description Representation Integer Bits Fractional BitsWord Length Total Numberof Bits process noise

co-variance matrix signed 3 15 18

process noise for

state vector x signed 6 12 18

process noise for

state vector y signed 9 9 18

process noise for

state vector z signed 2 16 18

process noise for

state vector vx signed 4 14 18

process noise for

state vector vy signed 3 15 18

Table 2.3: Word Length Estimation for Process Noise.

Data description Representation Integer Bits Fractional BitsWord Length Total Numberof Bits calibration pa-rameter Fc signed 11 7 18 calibration pa-rameter cc signed 10 8 18 calibration pa-rameter R11 signed 2 16 18 calibration parameter R12..R33 signed 1 17 18 calibration pa-rameter T signed 1 17 18 camera output1, output2 signed 10 8 18

(33)

2.3 Analysis on Data Representation 21

Data description Representation Integer Bits Fractional BitsWord Length Total Numberof Bits likelihood

func-tion signed 2 38 40

weight signed 1 39 40

cumulative sum

of weights signed (log2N)+1 39 (log2N)+40

random number signed 1 39 40

cumulative sum of random

num-ber signed (log2N)+1 39 (log2N)+40

weight memory

data signed (log2N)+1 39 (log2N)+40

weight memory

address unsigned 16 - 16

random value

memory data signed (log2N)+1 39 (log2N)+40

random value

memory address unsigned 16 - 16

Table 2.5: Word Length Estimation for Measurement Update and Random Num-ber Generation Step.

Data description Representation Integer Bits Fractional BitsWord Length Total Numberof Bits

scaled weights signed 1 49 50

scaled random

number signed 1 49 50

resampled

parti-cle index unsigned 16 - 16

resampling

memory data unsigned 16 - 16

resampling

memory address unsigned 16 - 16

(34)

22 Initial Analysis of the Requirement

represents each of the matrix elements of R. The camera output is a (2 × 1) matrix and each of the column vectors are represented as output1 and output2 in this table. The cumulative sum of random values and weights shown in Table 2.5 is added with log2N-bits as a part of their word length so as to avoid overflow during the cumulative summation which is described in detail in the Sec. 3.8. The cumulative sum of random values and scaling methodology is not followed as part of the requirement model. However, this methodology is followed in the hardware design to avoid the division operations involved in importance weight computation and thereby saving a lot of hardware resource. Moreover, this scaling methodology is verified by modifying the given algorithm and inspecting its output plot which is described in the Sec. 2.4. The memory addresses mentioned in the Tables 2.2, 2.5 and 2.6 denotes both the read and write addresses generated to all the memory components involved in the hardware design. All these addresses are considered to be unsigned type and the binary number is considered to be a standard integer numbers ranging from 0 to ((216) − 1). The characteristics of these memories are explained in chapter 3. In addition to this, the word lengths presented in this section can be modified using parameterisation of the hardware components as described in appendix A.

2.4

Characteristics of the Output Plot

It is worth while to analyse the features of the output plot to give a better picture for the reader about the requirement of this thesis work. As a part of this work, we have reused the “worldtoimage” function described in the Sec. 2.2.2 to trans-form the sensor measurements to a two-dimensional plane, so as to ensure that the Matlab model’s output follows the direction in which the measurements are available over time. The output plot which enables the observer to visualise the given requirement is shown in the Fig. 2.1.

The IR view of the camera obtained from the recorded surveillance video is shown on the left side of the figure. The fact that a Passive IR sensor is being used for this application explains the reason for an IR view in the plot and the need to estimate the position and velocity in two-dimensional plane using the noisy measurements of angular positions of the target. The green circle seen in the IR view denotes the estimated position given by the Matlab model to keep track of the target in that particular frame. On the right side of the plot, a top view of this tracking process is shown. The magenta dots denote the sensor measurements provided to the filter and the green line shows the estimated path taken by the target which is given by the Matlab model. The blue x mark denotes the current position specified by the Matlab model. The current position of the target is obtained using (2.17). If one takes a closer look near the x mark in the 2-D plane, the resampled particles are displayed in green dots near the x mark. This shows us how the resampled particles are distributed while the target is being tracked.

The Fig. 2.2 presents how the resampled particles are distributed during the initialisation step in the IR view for the example simulation shown in the Fig. 2.1. The target is encircled with the green circle placed around middle of the

(35)

distri-2.4 Characteristics of the Output Plot 23 IR−view (t=91.52 s) 0 5 10 15 20 25 −130 −125 −120 −115 −110 −105 −100

Position east of sensor origin [m]

Position north of sensor origin [m]

Map−view (t=91.52 s)

matlab estimate measurement current matlab positioning

Figure 2.1: Output plot of the Matlab model

(36)

24 Initial Analysis of the Requirement

bution is also shown in this zoomed IR view. Since the distribution of resampled particles makes it hard to observe the target in this view and so from now on only the current position will be represented in the IR view. So the objective of this work is now clearly implied from these analysis that the hardware estimate has to converge towards the measurements similar to that of the Matlab model’s tracked path as shown in the Fig. 2.1. This output is obtained using a total of 1000 particles and simulated for 895 iterations.

2.5

Summary of the Requirements

The analysis discussed in this chapter involves understanding the functionality of the filter and the type of resampling algorithm incorporated to track the target in the Matlab model. Which functionalities of the perspective camera are needed to be rewritten in HDL has been determined through this analysis. The essential characteristics like word length involved in the various steps of the algorithm like time update, measurement update and resampling are analysed mathematically. This analysis brings the foundation to make the algorithmic modifications needed in the hardware design which is discussed in the next chapter. Finally the method-ology to obtain the performance metrics of the filter estimate and the features of the simulated output are also analysed.

(37)

Chapter 3

Hardware Architecture for

SIRF

3.1

Introduction

The main objective of this work is to design a hardware for Particle filter. This chapter provides the detailed description of each modules developed for this syn-chronous hardware design. In computational intensive applications handling the data within the range is in need. Section 3.2 describes the quantization modes utilized in this design. Section 3.3 describes how the filter steps are scheduled to achieve better throughput. Essential components like counters and registers used in the various modules of the design are described in the Sec. 3.4. The hardware implementation of the sample step described during the analysis is explained in the Sec. 3.5. The hardware implementation of the simplified form of the impor-tance weight equation from the analysis is described in the Sec. 3.6. The design of random function and scaling methodology are described in the Sec. 3.7 and 3.8 respectively. The resampling step performed in the hardware design is detailed in the Sec. 3.9. Various controller modules implemented to accomplish the working of a recursive filter is described in the Sec. 3.10. Finally the sample rate of the overall hardware design is determined in the Sec. 3.11.

3.2

Quantization and Overflow Handling

As mentioned earlier in the Sec. 2.3, the hardware design is implemented in fixed-point binary number system to achieve reasonable performance along with low power consumption and low silicon cost when compared to floating-point system. There are various kinds of number system available to represent a number. This work utilizes two’s complement number system and it consist of magnitude part to left of the imaginary radix point and fractional part to the right of the radix point where the magnitude part is composed of a sign bit and integer part. The range of representable numbers in two’s complement number system is given by

(38)

26 Hardware Architecture for SIRF

radix point

8 bit integer part 10 bit fractional part

1 0 0 0 0 0 1 1 1

1 0 0 0 0 0 1 1 1 0 1

+1

Rounded upto 8 fractional bits

1 0 0 0 0 0 1 1 1

1 0 0 0 0 1 0 0 0 0 1

1 0 0 0 0 0 1 1 1

1 0 0 0 0 1 0 0

Final result after truncating 3 bits from LSB

Figure 3.1: Rounding and Truncation performed on 18-bit data to obtain 16-bit data.

(2m−12−f , −2m−1) and can be represented by the following equation, [6]:

v= −dm−12m−1+dm−22m−2+...+d121+d020(.)+d−12−1+d−22−2+...+d−f2−f

(3.1) where d denotes the digit weight (0 or 1, depending on 2’s complement coding),

mand f denotes the number of bits in magnitude and fractional respectively. The

radix point is considered to be imaginary because the hardware handles the com-putations like integer operations. So the designed architecture in HDL takes care of interpreting the binary numbers appropriately. It is clear from Tables 2.2 to 2.5 that most data word length is of 18 bits thus giving us a dynamic range of 131071. From a fractional binary number point of view, a data represented upto 2−18 can be handled. If finite length data are not managed properly then the precision of the computed results can be inutile. The values which exceed this dynamic range due to the finite word length, needs to be handled using concepts like quantiza-tion. Rounding and truncation are two quantization techniques which has been implemented in this hardware, when the number of bits in the fractional part of a given fixed point number is less than the computed result to be represented. There are various ways to perform rounding in hardware, the method followed in this work is illustrated with an example as shown in the Fig. 3.1. Here the 1 is added to the bit next to which the number is to be rounded.

Saturation is a technique followed to counteract overflow during arithmetic operations. Saturation is performed when an output of a computed result pro-duces more bits in the MSB side than the required result to be represented. The saturation has been performed in this hardware design in the following scenarios: • When addition of same sign of operands, results in opposite sign of the

(39)

3.3 Conceptual Schedule 27 if R(msb) != R(msb-1) if R(msb) = 1 R1 = -131072; else R1 = 131071; end end

Table 3.1: Pseudocode of Saturation logic for 18-bit data.

Time Update Measurement Update Cumsum Time Update Measurement Update Cumsum Resampling N Ltu Lmu 2N-1 N+3

Figure 3.2: Timing diagram for SIRF.

• When performing multiplication of ((−1)×(−1)) using fractional multipliers. An example of the saturation logic performed is shown in Table 3.1. We assume the result obtained from the above scenarios is denoted as variable R which is 19-bit data and the saturated result is denoted as variable R1 which is 18-19-bit data.

3.3

Conceptual Schedule

Speed is a critical factor to achieve a better performance oriented hardware. When the number of particles are increased, the throughput of the hardware is decreased significantly. So, to achieve the better speed in the hardware, an objective to design a parallel hardware is in need. In this work the essential steps of the resam-pling algorithm are overlapped similar to the pipelined SIRF utilising systematic resampling as stated in [3], except the cumulative sum step. The timing diagram of the SIRF designed in this work is shown in the Fig. 3.2 and it is achieved using the pipeling concept [7].

Ltuand Lmuare the latencies caused due to pipelining in time update step and measurement update step. Even though the pipelining induces latency in these intermediate steps, it helps to completely overlap the first 3 steps of the filter to achieve better sample rate. Pipelining also helps the hardware to execute the next iteration without any interruption by receiving the samples consecutively after the

(40)

28 Hardware Architecture for SIRF

Component Name Data Type Enable Output Port Register Type I Signed Available Register Type II Signed Not available Register Type III Unsigned Not available Table 3.2: Types of register components utilised in the design.

first resampled particle is obtained using the resampling step. The resampling step is partially overlapped with the next iteration to start afer (N + 3) cycles and the reason for this latency to start next iteration is explained in the Sec. 3.9. The latencies in each step are described in the later sections of this chapter and the latencies shown in this schedule helps to determine the minimum sample period of the filter. In addition to this, the execution time of resampling step is also deterministic as described in the Sec. 3.9.

3.4

Register and Counter

Before we look into the architectures designed at the various stages of the par-ticle filter, we discuss some of the basic components which the architectures are composed of. One of which is a well known component to the digital world called register. A Register is a combination of several edge triggered flip-flops which can hold the data when needed and helps to avoid unnecessary memory access. In this work we have used 3 different customised versions of register components. The details of these components are described in the Table 3.2. The use of these reg-isters in the various parts of the hardware architecture are discussed in the later sections of this chapter.

Register Type I can hold signed data type and it consist of an enable ouput port. When this register is enabled to read the holding data to the subsequent stage, the output port will enable the subsequent stage. A similar logic is ap-plicable to other types of registers shown in the Table 3.2, except that they can hold different data type and have no enable output port. The number of bits that could be hold in these registers are controlled by the generic parameterisation dis-cussed in appendix A. The other important component of this hardware design is a counter. The functionality of this counter is to increment the counter initialized values on every clock cycle. Two different types of counter components used in the design are illustrated in the Fig. 3.3.

Both of these counter types generate a 16-bit unsigned number from (0 to max-1), where max is the maximum value parameterised to the counter. A common characteristic to both these counters is that once the counter value reached its maximum value, it is reset to zero. An additional distinct characteristic to the counter type II is that when the clear input port is set high then the value of the counter is reset to zero and the need for this counter is discussed in the Sec. 3.9. The counters are considered as important in this design due to the reason that the schedule discussed in the previous section is possible to work precisely based on the filter cycle time and this cycle time is controlled by these counters. Using these

(41)

3.5 Time update 29 counter type I clock reset output port enable 16 counter type II clock

reset output port

enable

16 (a) Component counter type I

(b) Component counter type II clear

Figure 3.3: Two different counter components implemented in the design.

two counter components, different types of counters are instantiated based on its need in the design. All these instantiated counters used in the architecture can be categorised under these two counter components. The registers and counters implemented in this design are synchronous components. This brief introduction about all these counters shown in the Table 3.3 enables the reader to visualize its characteristics when disscussed in the architectures in the later sections of this chapter.

3.5

Time update

The purpose of this step is to generate particles everytime a new iteration is started. We accomplish this by using the equations derived in 2.11 as the DSS model describing the evolution of states over time. The hardware components involved in describing the time update step is shown in the Fig. 3.4.

The initialisation block shown in this figure consist of the initial positional vectors of the target as described in (2.7). The DSS model block computes the state equations presented in (2.11). A demultiplexer is utilised to select the enable signals between these two blocks. The initialisation block is only enabled for the first iteration and from then on only the DSS model block is enabled. Once the output is ready from either of these two blocks, the process noise is added to this output using the process noise generator block. Both the DSS model and the process noise generator are enabled at the same time using the controller block is illustrated in the timing diagram shown in the Fig. 3.7. The functionality of this controller block is discussed in the Sec. 3.10. The select signals involved in this

(42)

30 Hardware Architecture for SIRF

Counter Name Counter Type Counter Functionality

PSC Type I Takes care of cycle time from the start of time update step till the cumsum step.

RSC Type I Takes care of cycle time after the cumsum step till the end of re-sampling step.

RRC Type I Increments for every read of the cumsum of all the random values from the memory to the resam-pling step.

Sample counter Type I Increments for every new itera-tion

Address counters for time update, random value and resampling memories

Type I Increments for either read or write into time update, random value and resampling memories. Address counters for

weight memory Type II Increments for every read orwrite into weight memory. SIRF counter Type I Take care of cycle time to send new sensor measurement from the testbench.

Final estimation

counter Type I Takes care of cycle time to obtainthe resampled particles from the last iteration.

iteration counter Type I Increments the frame identifica-tion number on every new itera-tion.

(43)

3.5 Time update 31 Initialisation Block DSS Model Demux Mux 0 Covariance Matrix sqrt(P0) Covariance Matrix sqrt(Q0) Mux 1 Adder Block1 Block2 Block3 Block4 Block5 Spliter Merger Process Noise Generator 90 1 450 450 450 90 90 90 90 90 90 90 90 90 90 1 1 Data out Data in Enable in 18 18 18 18 18 sel sel0 sel1 1

enable Enable out

Figure 3.4: Functional Block Diagram of the Time Update Step.

time update component are controlled using a counter and a select logic as shown in the figure 3.5.

The covariance matrix blocks in this figure generates the square-root of the covariance matrix values as described in (2.12) and (2.13) for the process noise generation. Referring from Table 2.3, we can infer that only 15 fractional bits were feasible for this matrix. So for those values which can’t be represented within this limit are approximated. The approximated values were choosen randomly such that they were close to their original values. The matrix values after approximation is given by:

Time Update Block Hardware controller

Block

Sample Counter TU Select Control enable in enable count sel Data in 90 16 1 1 TU memory0 TU memory1 To perspective camera 90 enable out 90 90 1 de m u x Demux enable 1 Memory controller 90 sel1 sel1 M u x sel1

Figure 3.5: Functional Block Diagram Showing the Communication of the Time Update Block in the Top Level Architecture.

(44)

32 Hardware Architecture for SIRF Input Output r1 r2 r3 r4 r5 90 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18

Figure 3.6: Internal Architecture of Process Noise Generator.

p Q0=      

5.85e−04 6.00e−05 6.00e−05 9.68e−04 1.20e−04 −6.00e−05 5.85e−04 2.40e−04 8.00e−05 9.68e−04 6.00e−05 2.40e−04 7.75e−03 8.00e−05 2.40e−04 9.68e−04 8.00e−05 8.00e−05 4.90e−02 1.20e−04 1.20e−04 9.68e−04 2.40e−04 1.20e−04 4.90e−02

      (3.2) The components of a internal block in the process noise generator is shown in the Fig. 3.6. r1, r2, r3, r4 and r5 are the random numbers generated using LFSR to randomize the noise values. Since the covariance matrix is a (5 × 5) matrix, the process noise generator employs five such internal blocks and this significantly reduces the latency of time update step with a compensation of extra multipliers. Since speed of the filter is a critical requirement for this application, the process noise has been implemented using five of these internal blocks without the need of pipelining the time update step.

The TU select control block shown in the Fig. 3.5 distinguishes the initialisa-tion step from other iterainitialisa-tions using the sample counter value to select the proper covariance matrix values and the state equation values using multiplexers mux0 and mux1 shown in the Fig. 3.4. The sample counter is incremented for every new iteration by the hardware controller. The “sel1” signal which controls the demux is provided by the bus control block which is discussed in the Sec. 3.10. Once the adder has computed the state equation, it enables the perspective camera to receive the time updated particle value. This enable out signal from the adder also enables the memory controller to write the time updated particle value into the memory. The memory implemented for this step is a single port synchronous (N × 90)-bit BRAM. Since there is a need to keep the particles from the

(45)

previ-3.5 Time update 33 clock Reset DSS model enable in DSS model input data Di Do DSS model output data Process noise enable in Process noise data in Process noise data out Pi Po Process noise enable out Adder enable in Adder data in Ai Ao Adder data out Adder enable out latency of 2 cycles

Figure 3.7: Timing diagram of Time Update Step.

ous time instant while the new set of particles is being updated, we utilised two BRAMs for the time update step. So while one BRAM is being used to read out the resampled particles from previous iteration, the other BRAM will be used to write time updated particles of the current iteration. The read and write process of the TU memory is taken care by the memory controller block. The control detials are explained in the Sec. 3.10. The timing diagram shown in the Fig. 3.7 illustrates the cycle time consumed to generate a single particle value from the time update block.

The first input Di to this component is given in the second clock cycle in this example along with the enable signal of DSS model block being set high. The corresponding output Aofrom the adder is obtained at third clock cycle, implying that the latency for the time update step Ltu is determined to be 2 clock cycles. Thus the ouput of this block is written to a memory and to the pespective camera simultaneously using the enable out signal from the time update block to proceed with the measurement update step.

References

Related documents

Selen - et sporelement med klinisk betydning | Tidsskrift for Den norske legeforening.. Selen - et sporelement med

för Skolverket: 232.. Det innebär att eleverna på de studieförberedande programmen får en mer fördjupad kunskap inom det samhällsvetenskapliga ämnet. Frågan som då uppkommer

Enligt siffror från Östekonomiska By- rån uppgick Sovjets import från väst 1984 till cirka 33 miljarder dollar!. Av det utgjorde maskinimporten 7,5 miljarder

According to Cooper and Ellram (1993) inventory management is mainly about reducing redundant inventory and in the supply chain of McDonald’s HAVI has the responsibility

Även om synliggörandet av kvinnors våldsutsatthet inneburit att våldet har fått namn, våldsutsatta kvinnor har fått erkännande och ny lagstiftning har tydliggjort männens

The purpose of this project is the creation of an adaptive Function Block control system, and the implementation of Artificial Intelligence integrated within the Function

Fyndbeståndet är ofta mycket litet, vid första anblicken homogent, och på grund av den så kallade &#34;Hallstattplatån&#34;, saknas ofta exakta 14 C-dateringar under en del av

The static nature of the application description, that is, the static input sizes and rates combined with the known kernel resource requirements and application graph, enables