• No results found

1. Micro Architecture and Finite Length

N/A
N/A
Protected

Academic year: 2021

Share "1. Micro Architecture and Finite Length"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

1

1. Micro Architecture and Finite Length

Olle Seger (olle.seger@liu.se) Andreas Ehliar (ehliar@isy.liu.se)

Dake Liu, Rizwan Azhgar

(2)

Outline

 Introduction

 Some Administrative Information

 Basic Components

 Finite Length, Overflow, 2-complement, rounding, saturation

 About Lab-1, the Senior Processor …..

(3)

3

Administrative Information

 Labs

 In groups of two students

 No written report

 Be prepared to answer questions

 (both of you) about how your design works

 Mandatory

 FAQ:

 Q: What if you miss a lab?

 A: Do the lab yourself and show it at a later date (ssh to ixtab.edu.isy.liu.se)

(4)

2’s Complement Number

Representation

1 0

-1 ½ ¼ 1/8

1 0 1 0

1 0

-4 2 1 ½ ¼ 1/8 1/16 1/32

1 0 1 0 0 1

It’s easy to increase the number of bits.

It’s still the same number.

duplicate sign bit concatenate zeros

[-1,1-1/8]

[-4,4-1/32]

1 0

(5)

2’s Complement Number

Representation

5

1 0

-4 -2 1 ½ ¼ 1/8 1/16 1/32

1 0 1 0 y1

x2

1 0

-1 ½ ¼ 1/8

0 1 1 1

x1 y2

1 0

-4 -2 1 ½ ¼ 1/8

1 0 1 y1 x2

x1

rounding

1 0

-1 ½ ¼ 1/8

1 0 0 0 x1=0 x1x2=10

1 0

1 0 1 y1

-1 ½ ¼ 1/8

x1x2=11

MAX MIN

1 0

-4 -2 1 ½ ¼ 1/8

1 0 1 0 x2

x1

truncate

saturate

(6)

Adder(signed/unsigned)

Implicitly: integer, two’s complement

{c_o, O[15:0]} <= A[15:0] + B[15:0] + {15’b0, c_i}

Alternatively

{c_o,O[15:0],x} <=

{A[15],A[15:0],1} + {B[15],B[15:0], c_i}

Input operands : N bit ; Output : N+1 bit

Subtraction : O

+

A B

c_o c_i

(7)

7

Multiplier(signed)

O[7:0] <= A[3:0]  B[3:0]

Example:

Integer or Fractional Multiplication 0111  0111 = 00110001 or

0.111  0.111 = 00.110001 = 0. 1100010 MULS

OMB[15:0]

OMA[15:0]

32

Mul_Output [31:0]

Input operands : N bit ; Output : 2N bit

(8)

Signed multiplication

paper&pencil algorithm

0111 7

* 0111 7 00000111 7 00001110 14 00011100 28 00000000 0 00110001 49

1001 -7

* 0111 7 11111001 -7 11110010 -14 11100100 -28 00000000 0 11001111 -49

0111 7

* 1001 -7 00000111 7 00000000 0 00000000 0 11001000 -56 11001111 -49

1001 -7

* 1001 -7 11111001 -7 00000000 0 00000000 0 00111000 56 00110001 49

(9)

10 (10,30)

X

scale

round

sat

accumulator

Register File

DM0 DM1

(1,15)

(10,30) (10,15)

(1,15) (2,30)

sign extend to(10,30) (1,15)

ALU

ar ar

(10)

A Rounding Example

A7A6A5A4 + 0 0 0 A3 B7B6B5B4

A7A6A5A4A3 + 0 0 0 0 1 B7B6B5B4 X

A3

(11)

12

Senior Assembler & Simulator

Assembly Code Includes:

Assembly Instructions: LD, ST, ADD, CMP, …

Symbolic name for memory locations: labels

Assembler directives: .skip 31, .df 0.125, …

Senior Assembler: Translates assembly code into an executable binary code (Hex Format).

Senior Simulator: Takes the hex file and provides a debugging environment.

Assembly Code (ex.asm)

ex.hex

Assembler (srasm)

Debugging + Output text file

Simulator (srsim)

(12)

Senior

Senior: DSP with lots of bells and whistles

• 32 16-bit general regs (r0-r31)

• 32 16-bit special purpose regs

• 4 32-bit accumulator regs

• + 8 guard bits (acr0-acr3)

(13)

About Senior

Special purpose registers

14

(14)

About Senior

 Memory

Where is the data? rom0

Where are the coefficients? rom0

But you need them at the same time. So?

 How to save the output to a text file

 out 0x11, r31

 Important instructions

 convxx

 repeat vs cmp & jump

 set, clr

 move, ld, st

Hint : check the cycles required for data to be ready and use NOP

accordingly.

ram0 ram1

PM CP

rom0

RF DP

dm0 dm1

(15)

16

About Senior

 move, load and store instructions

move r7,r14

move.eq r22, rnd mul2 acr3 set r21,711

ld0 r1,(ar1,r9) ; r1 <- M0(ar1+r9) ld1 r1,(ar0++%) ; r1 <- M1(ar0)

; ar0 = (ar0==top0)?bot0:ar+step0 st1 (ar2++),r5 ; M1(ar2)<-r5,ar2++

(16)

About Senior

 Short arithmetic, logic, shift instructions

 Long instructions

add r7,r14,r15 add.ne r7,r12

addl.meq acr2,acr1,acr0 addl acr1,acr3,r2:0

convss acr0,(ar0++%),(ar1++%)

;acr0 += M0(ar0)*M1(ar1) , ar0 , ar1

(17)

18

About Senior

 How to use “repeat”

 Hardware loop!

……

repeat label_end, 32 set r4,0xfa72

move r1,sr3

mac acr0, r0, r1 label_end

move r17,sr31

……

These 3 instructions are repeated. No (visible) loop counter. No test. No jump.

(18)

About Senior

How to use conditional branch “jump”

set r0,32 ; set loop counter label_start

dec r0 ; decrement loop counter jump.ne label_start ; no delay slots

xxx ; branch delayed yyy ; 3 cycles

zzz ;

(19)

20

About Senior

 jump instruction – Another Example

……

jump.ne ds2,label4

move r1,sr3 ; this will always execute set r2,7 ; so will this

move r12,r3 ; but not this label4

set r7,3

……

set r0,32 ; set loop counter label_start

dec r0 ; decrement loop counter jump.ne ds3 label_start

xxx yyy zzz

(20)

About srsim

 How to debug in simulator (srsim)

 h: help menu

 r<n>: execution ‘n’ lines of instructions

 l: list the instructions around the pc

 p: print of the values in registers

 Special registers: which are ar0 and ar1?

 Accumulation registers: which is acr0?

 g: run the whole program

(21)

Exercise

22

(22)

Exercise

(23)

25

Convolution

present sample previous sample

reg reg reg reg

Round Saturation

x(n) x(n-1) x(n-2) x(n-3) x(n-4)

h(0) h(1) h(2) h(3) h(4)

y(n)

+ + + +

) 4 (

) 4 ( )

3 (

) 3 ( )

2 (

) 2 ( )

1 (

) 1 ( )

( ) 0 (

) (

) ( )

( 4

0

n x h

n x h

n x h

n x h

n x h

k n x k h n

y

k

(24)

Exercise 1.2

26

h(0) h(1) h(31)

ar1

x(0) x(1) x(999)

ar0

0 0

bot1 top1

;; coeffs copied rom0 -> ram1 fir_filter

set r3,signal

set r1,1000 ; loop counter set ar1,coeffs ; ar1->coeffs set ar0,zeros ; ar0->signals set step1,1

set bot1,coeffs

set top1,coeffs_end ;;

loop

inc r3 move ar0,r3 repeat falt,32

convss acr0,(--ar0),(ar1++%) falt

dec r1

jump.ne ds3 loop

move r31,rnd div2 acr0

clr acr0 ; clear accu out 0x11,r31

;;

;; end of code out 0x13,r0

.rom0

.scale 2.0 signal

.df 0.0000 .df 0.588059

coeffs

ram1

rom0

1000 0

) (

) ( )

( 31

0

n

k n x k h n

y

k

(25)

Exercise 1.2 with ringbuffer

27

h(0) h(1) h(31)

ar0

bot0 top0

coeffs

rom0

x(0) x(1)

ar1

bot1 top1

ringbuffer

ram1

x(0) x(1)x(999)

ar2

ekg

; ekg copied rom0->ram1

; zeros in ringbuffer

; pointers fixed

;;

set r1,1000 ; loop counter ;;

loop

ld1 r0,(ar2++) ; read signal dec r1 ; dec loop cnt st1 (ar1),r0 ; write r.b.

repeat falt,31

convss acr0,(ar0++%),(ar1++%) falt

move r2,ar1

convss acr0,(ar0++%),(ar1++%) move ar1,r2

jump.ne ds3 loop

move r31,rnd div2 acr0 clr acr0 ; clear accu out 0x11,r31

;; end of code out 0x13,r0

(26)

x = x0 + sin

h

31 0

) (

) ( )

(

k

k n x k h n

y

(27)

Frequency domain

29

(28)

Exercise 1.3

in r0,0x10

clr acr0

macss acr0,r0,r5 macss acr0,r1,r6 macss acr0,r2,r7 macss acr0,r3,r8 macss acr0,r4,r9

move r10,sat rnd acr0

nop out 0x11,r10

0

r0

0

r1 r2 r3 r4

h0

r5 r6 r7 r8 r9

0 0 0

h1 h2 h3 h4

in r4,0x10 clr acr0

macss acr0,r4,r5 macss acr0,r0,r6 macss acr0,r1,r7 macss acr0,r2,r8 macss acr0,r3,r9

move r10,sat rnd acr0

nop out 0x11,r10

ringbuffer

coeffs

… Unroll the loop 5 times!

Step h,x forward Fill in x backward

References

Related documents

But because of how the FPGA code is synthesized or how the Senior processor has been synthesized, the clock signal to the Senior processor has to be inverted to be able to

We find that, in general, no significant correlation can be found between gender and the size of the bid-ask spread, indicating that investors do not perceive the risk of

Det multidiciplinära anslaget på problemlösning… måste fortsätta Äldre människors deltagande &amp; konkret åtgärder… måste fortsätta Vinster – den

The aim of the study is twofold: first and foremost to gain a deeper understanding of how multilingual children perceive the functions of their mother tongue and the

A variety of studies and theories will be used in order to explain the current knowledge of sustainability and financial performance versus executive compensation and financial

Read through the example file, then try to use the srasm assembler (Section 0.4) to convert the assembly source file to the binary code which can be understood and executed by

It sets an internal memory space pointer to direct the following code generation to the program memory space.. This program memory space is used for actual program content, that

The data used as our basis for investigating and answering our ques- tions was gathered through a series of eye-tracking experiments with various participants who read various