1. Micro Architecture and Finite Length

(1)

1

1. Micro Architecture and Finite Length

Olle Seger (olle.seger@liu.se) Andreas Ehliar (ehliar@isy.liu.se)

Dake Liu, Rizwan Azhgar

(2)

Outline

 Introduction

 Some Administrative Information

 Basic Components

 Finite Length, Overflow, 2-complement, rounding, saturation

 About Lab-1, the Senior Processor …..

(3)

3

Administrative Information

 Labs

 In groups of two students

 No written report

 Be prepared to answer questions

 (both of you) about how your design works

 Mandatory

 FAQ:

 Q: What if you miss a lab?

 A: Do the lab yourself and show it at a later date (ssh to ixtab.edu.isy.liu.se)

(4)

2’s Complement Number

Representation

1 0

-1 ½ ¼ 1/8

1 0 1 0

1 0

-4 2 1 ½ ¼ 1/8 1/16 1/32

1 0 1 0 0 1

It’s easy to increase the number of bits.

It’s still the same number.

duplicate sign bit concatenate zeros

[-1,1-1/8]

[-4,4-1/32]

1 0

…

(5)

2’s Complement Number

Representation

5

1 0

-4 -2 1 ½ ¼ 1/8 1/16 1/32

1 0 1 0 y1

x₂

1 0

-1 ½ ¼ 1/8

0 1 1 1

x₁ y₂

1 0

-4 -2 1 ½ ¼ 1/8

1 0 1 y₁ x₂

x₁

rounding

1 0

-1 ½ ¼ 1/8

1 0 0 0 x₁=0 x₁x₂=10

1 0

1 0 1 y₁

-1 ½ ¼ 1/8

x₁x₂=11

MAX MIN

1 0

-4 -2 1 ½ ¼ 1/8

1 0 1 0 x₂

x₁

truncate

saturate

(6)

Adder(signed/unsigned)

Implicitly: integer, two’s complement

{c_o, O[15:0]} <= A[15:0] + B[15:0] + {15’b0, c_i}

Alternatively

{c_o,O[15:0],x} <=

{A[15],A[15:0],1} + {B[15],B[15:0], c_i}

Input operands : N bit ; Output : N+1 bit

Subtraction : ^O

+

A B

c_o c_i

(7)

7

Multiplier(signed)

O[7:0] <= A[3:0]  B[3:0]

Example:

Integer or Fractional Multiplication 0111  0111 = 00110001 or

0.111  0.111 = 00.110001 = 0. 1100010 MULS

OMB[15:0]

OMA[15:0]

32

Mul_Output [31:0]

Input operands : N bit ; Output : 2N bit

(8)

Signed multiplication

paper&pencil algorithm

0111 7

* 0111 7 00000111 7 00001110 14 00011100 28 00000000 0 00110001 49

1001 -7

* 0111 7 11111001 -7 11110010 -14 11100100 -28 00000000 0 11001111 -49

0111 7

* 1001 -7 00000111 7 00000000 0 00000000 0 11001000 -56 11001111 -49

1001 -7

* 1001 -7 11111001 -7 00000000 0 00000000 0 00111000 56 00110001 49

(9)

10 (10,30)

X

scale

round

sat

accumulator

Register File

DM0 DM1

(1,15)

(10,30) (10,15)

(1,15) (2,30)

sign extend to(10,30) (1,15)

ALU

ar ar

(10)

A Rounding Example

A₇A₆A₅A₄ + 0 0 0 A₃ B₇B₆B₅B₄

A₇A₆A₅A₄A₃ + 0 0 0 0 1 B₇B₆B₅B₄X

A₃

(11)

12

Senior Assembler & Simulator

 Assembly Code Includes:

 Assembly Instructions: LD, ST, ADD, CMP, …

 Symbolic name for memory locations: labels

 Assembler directives: .skip 31, .df 0.125, …

 Senior Assembler: Translates assembly code into an executable binary code (Hex Format).

 Senior Simulator: Takes the hex file and provides a debugging environment.

Assembly Code (ex.asm)

ex.hex

Assembler (srasm)

Debugging + Output text file

Simulator (srsim)

(12)

Senior

Senior: DSP with lots of bells and whistles

• 32 16-bit general regs (r0-r31)

• 32 16-bit special purpose regs

• 4 32-bit accumulator regs

• + 8 guard bits (acr0-acr3)

(13)

About Senior

Special purpose registers

14

(14)

About Senior

 Memory

 Where is the data? rom0

 Where are the coefficients? rom0

 But you need them at the same time. So?

 How to save the output to a text file

 out 0x11, r31

 Important instructions

 conv^xx

 repeat vs cmp & jump

 set, clr

 move, ld, st

 Hint : check the cycles required for data to be ready and use NOP

accordingly.

ram0 ram1

PM CP

rom0

RF DP

dm0 dm1

(15)

16

About Senior

 move, load and store instructions

move r7,r14

move.eq r22, rnd mul2 acr3 set r21,711

ld0 r1,(ar1,r9) ; r1 <- M0(ar1+r9) ld1 r1,(ar0++%) ; r1 <- M1(ar0)

; ar0 = (ar0==top0)?bot0:ar+step0 st1 (ar2++),r5 ; M1(ar2)<-r5,ar2++

(16)

About Senior

 Short arithmetic, logic, shift instructions

 Long instructions

add r7,r14,r15 add.ne r7,r12

addl.meq acr2,acr1,acr0 addl acr1,acr3,r2:0

convss acr0,(ar0++%),(ar1++%)

;acr0 += M0(ar0)*M1(ar1) , ar0 , ar1  

(17)

18

About Senior

 How to use “repeat”

 Hardware loop!

……

repeat label_end, 32 set r4,0xfa72

move r1,sr3

mac acr0, r0, r1 label_end

move r17,sr31

……

These 3 instructions are repeated. No (visible) loop counter. No test. No jump.

(18)

About Senior

How to use conditional branch “jump”

set r0,32 ; set loop counter label_start

…

dec r0 ; decrement loop counter jump.ne label_start ; no delay slots

xxx ; branch delayed yyy ; 3 cycles

zzz ;

(19)

20

About Senior

 jump instruction – Another Example

……

jump.ne ds2,label4

move r1,sr3 ; this will always execute set r2,7 ; so will this

move r12,r3 ; but not this label4

set r7,3

……

set r0,32 ; set loop counter label_start

…

dec r0 ; decrement loop counter jump.ne ds3 label_start

xxx yyy zzz

(20)

About srsim

 How to debug in simulator (srsim)

 h: help menu

 r<n>: execution ‘n’ lines of instructions

 l: list the instructions around the pc

 p: print of the values in registers

 Special registers: which are ar0 and ar1?

 Accumulation registers: which is acr0?

 g: run the whole program

(21)

Exercise

22

(22)

Exercise

(23)

25

Convolution

present sample previous sample

…

reg reg reg reg

Round Saturation

x(n) x(n-1) x(n-2) x(n-3) x(n-4)

h(0) h(1) h(2) h(3) h(4)

y(n)

+ + + +

) 4 (

) 4 ( )

3 (

) 3 ( )

2 (

) 2 ( )

1 (

) 1 ( )

( ) 0 (

) (

) ( )

( ⁴

0



























n x h

k n x k h n

y

k

(24)

Exercise 1.2

26

h(0) h(1) h(31)

ar1

x(0) x(1) x(999)

ar0

0 0 …

…

bot1 top1

;; coeffs copied rom0 -> ram1 fir_filter

set r3,signal

set r1,1000 ; loop counter set ar1,coeffs ; ar1->coeffs set ar0,zeros ; ar0->signals set step1,1

set bot1,coeffs

set top1,coeffs_end ;;

loop

inc r3 move ar0,r3 repeat falt,32

convss acr0,(--ar0),(ar1++%) falt

dec r1

jump.ne ds3 loop

move r31,rnd div2 acr0

clr acr0 ; clear accu out 0x11,r31

;;

;; end of code out 0x13,r0

.rom0

.scale 2.0 signal

.df 0.0000 .df 0.588059

coeffs

ram1

rom0

1000 0

) (

) ( )

( ³¹

0













n

k n x k h n

y

k

(25)

Exercise 1.2 with ringbuffer

27

h(0) h(1) h(31)

ar0

…

bot0 top0

coeffs

rom0

x(0) x(1)

ar1

…

bot1 top1

ringbuffer

ram1

x(0) x(1) … x(999)

ar2

ekg

; ekg copied rom0->ram1

; zeros in ringbuffer

; pointers fixed

;;

set r1,1000 ; loop counter ;;

loop

ld1 r0,(ar2++) ; read signal dec r1 ; dec loop cnt st1 (ar1),r0 ; write r.b.

repeat falt,31

convss acr0,(ar0++%),(ar1++%) falt

move r2,ar1

convss acr0,(ar0++%),(ar1++%) move ar1,r2

jump.ne ds3 loop

move r31,rnd div2 acr0 clr acr0 ; clear accu out 0x11,r31

;; end of code out 0x13,r0

(26)

x = x0 + sin

h









31 0

) (

) ( )

(

k

k n x k h n

y

(27)

Frequency domain

29

(28)

Exercise 1.3

in r0,0x10

clr acr0

macss acr0,r0,r5 macss acr0,r1,r6 macss acr0,r2,r7 macss acr0,r3,r8 macss acr0,r4,r9

move r10,sat rnd acr0

nop out 0x11,r10

0

r₀

0

r₁ r₂ r₃ r₄

h₀

r₅ r₆ r₇ r₈ r₉

0 0 0

h₁ h₂ h₃ h₄

in r4,0x10 clr acr0

macss acr0,r4,r5 macss acr0,r0,r6 macss acr0,r1,r7 macss acr0,r2,r8 macss acr0,r3,r9

move r10,sat rnd acr0

nop out 0x11,r10

ringbuffer

coeffs

… Unroll the loop 5 times!

Step h,x forward Fill in x backward