• No results found

3.MAC Design

N/A
N/A
Protected

Academic year: 2021

Share "3.MAC Design"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

3.MAC Design

Dake Liu (dake@isy.liu.se) Olle Seger (olle.seger@liu.se)

Andreas Ehliar (ehliar) Jian Wang

•MAC Introduction

•Multiplier

•MAC Design : A Case Study

•Exercises

(2)

MAC

Most important HW module in DP of any DSP Processor

Supports Algorithms like

Convolution (most frequently used in DSP Algorithms)

Filtering, FIR, IIR, Auto-Correlation, Cross-Correlation

Transforms (FFT, DCT etc)

Double Precision Arithmetic Operations

MAC Building Blocks

Multiplier

Accumulator

ACR Registers

Multiplexers

Functions (e.g. Rounding, Scaling, Saturation, Flags etc)

(3)

We use only Signed Multipliers

Unsigned +1 = 1.000 is extended to 01.000 Signed -1 = 1.000 is extended to 11.000

SIGNED MULT (N+1) bits

A

2N+2

U/Sig n

P

B

U/Sig nN+1 N+1

N N

-1 x -1 (N+1 bit Mult)

     11.000 (­1)      11.000 (­1)

­­­­­­­­­­­­­­­­

0000.000000 0000.00000 0000.0000

1111.000    (­1) 0010.00     (+2)

­­­­­­­­­­­­­­­­

0001.000000 (+1) 

Correct; Sat. will work +1 x +1 (N+1 bit Mult)

     01.000 (+1)      01.000 (+1)

­­­­­­­­­­­­­­­­

0000.000000 0000.00000 0000.0000

0001.000    (+1) 0000.00

­­­­­­­­­­­­­­­­

0001.000000 (+1) 

Correct; Sat. will work

(4)

MAC Design: A case study

• Functions

– Integer / Fractional multiplication (16 x 16) – Signed / Unsigned multiplication

– Convolution with 8 guards and initialization – Round

– Saturation

– 32 bits Long Plus and Minus – 32 bits Long operation: ABS

(5)

MAC Design –

Start

MUL ACR

”0”

OMA[15:0]

OMB[15:0] ACR keeper function

Accumulation feedback Here: ACR is Accumulation

register. ACR is 32 bits according to SPEC.

”0” gives initialization to ACR for a convolution

(6)

MAC Design –

Guard at MUL-out

MUL ACR

”0”

OMA[15:0]

OMB[15:0]

Accumulation feedback Is 40 bits now!

G8 at MUL output is:

out[39:32] = {8{in[31]}}

32 40

40

8G 40

(7)

MAC Design –

Input MUX for MUL

MUL ACR

”0”

OMA[15:0]

OMB[15:0]

MUL Input from Register File or Mem.

32 40

40

8G 40

REG

RF-Port1

Mem1 OMA

REG

RF-Port2

Mem2 OMB

(8)

MAC Design –

Load to ACR

MUL

REG

RF-Port1 Mem1

OMB

REG

RF-Port2

Mem2 ACRH

”0”

ACRL

Guard8 following ”0”

OMA and OMB is:

out[23:16] = {8{in[16]}}

Double load to ACR

15:0 39:16 23:0

23:0

15:0 15:0

Merge {H[23:0], L[15:0]}

40

16 24

OMA G8 G8

G8

(9)

MAC Design –

ACR1 and ACR2 are used

MUL

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2

Mem2 G8

ACR1L ACR1H

ACR2L ACR2H

G8

Two accumulator registers are

used for MAC 16

16 24 24

40 40 40

40

40 40

40

40

G8

16

G8G8

(10)

MAC Design –

Signed and Unsigned

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2 Mem2

Guard 8

ACR1L ACR1H

ACR2L ACR2H 16

16

16

16 24 24

40 40 40

40

40 40

40

40

MUL

17 17

Guard 6

Guard 8 U/Sig nU/Sig n

Signed

A,B = {[15],[15:0]}

Unsigned

A,B = {1’b0,[15:0]}

34

(11)

MAC Design –

Integer and Fractional

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2

Mem2 G8

ACR1L ACR1H

ACR2L ACR2H

I: {6’b[33],[33:0]}

F: {5’b[33],[33,0],0}

16

MUL

17 17

G8 U/Sig nU/Sig n I/F/G6

(12)

I/F?

ACRH ACRL

Before MUL (2,15) After MUL (4,30) 0

GUARD

REG

After I/F (9,31) In ACR (9,31) Nr of bits

1

REG

1 15

15 2

4 30

30 4

5

In REG (1,15)

15 16 8 1

Where is the binary point?

(13)

Rounding

Rounding-vector {[39:16], 16’b0}+

{23’b0, [15], 16’b0}

or

{[39:15], 15’b0}+

{24’b0, 1, 15’b0}

X

0 … 0 0 … 0

0 … 0 X

0 … 0 0 … 0

X

0 … 0

1

1 bit, value=1

16 16 8

(14)

MAC Design –

Round for MUL, ACR1, ACR2

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2 Mem2

Guard 8

ACR1L ACR1H

ACR2L ACR2H

OAA OAB

”0”

”0”

40 16

MUL

17 17

Guard 8 U/Sig nU/Sig n RND

I/F/G6

Rounding-vector {[39:16], 16’b0}

+{23’b0, [15], 16’b0}

or

{[39:15], 15’b0}

+{24’b0, 1’b1,

15’b0} 14

(15)

MAC Design –

Long Plus

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2 Mem2

Guard 8

ACR1L ACR1H

ACR2L ACR2H

OAA OAB

”0”

”0”

40 16

MUL

17 17

Guard 8 U/Sig nU/Sig n RND

I/F/G8

Pass, no round

Long ADD

ACR1= ACR1 + ACR2

(16)

MAC Design –

Long

Minus

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2 Mem2

Guard 8

ACR1L ACR1H

ACR2L ACR2H

OAA

”0”

”0”

ADD:{[39:0],0}

SUB:{[39:0],1}

Long SUB

ACR1=ACR1 - ACR2

40 41 16

MUL

17 17

Guard 8 U/Sig n RND

U/Sig n

ADD: {[39:0],0}

SUB:

{inv[39:0],1}

OAB

I/F/G8

40

(17)

MAC Design –

Long ABS

RND

I/F

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2 Mem2

Guard 8

ACR1L ACR1H

ACR2L ACR2H

OAA

”0”

”0”

40 16

MUL

17 17

Guard 8 U/Sig nU/Sig n

{[39:0],0} ADD {[39:0],1} NEG {39’b0, [39],0}

If (ACR1[39]==1) then ACR1 = {INV(ACR1),0} + {39’b0, [39],0};

OAB

ABSC

41 bit

(18)

MAC Design –

Saturation

RND

I/F

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2 Mem2

Guard 8

ACR1L ACR1H

ACR2L ACR2H

OAA

”0”

”0”

40

16

MUL

17 17

Guard 8 U/Sig nU/Sig n

18

OAB

ABSC

Saturatio n

If guards <>[39]

If [39]==0

Saturation = {9’b0, 31’b1}

Else

Saturation = {9’b1, 31’b0}

Else

Saturation = result [39:0]

41 bit

(19)

1

RF

(20)

1

(21)

2

(22)

2

(23)

2

(24)

csmul1 e0,f0 // x=Re(e0*f0),y=f0

csmul2 o0,e0,e1 // Re(o0)=x-Re(e1), Im(o0)=Im(e0*y)-Im(e1) cadd o1,e0,e1 // Re(o1)=Re(e0)+Re(e1),Im(o1)=Im(e0)+Im(e1)

o0 = e0*f0 – e1 // 4 MUL, 4 ADD, 4 ARG o1 = e0 + e1 // 2 ADD, 3 ARG

(a+ib)(c+id) = (ac-bd) + i(ad+bc) 4MUL + 2ADD

On whiteboard

(25)

Instructions

tmp = Re(a*b)

= Re(a)*Re(b) – Im(a)*Im(b) oldb = b

Re(o) = tmp – Re(b)

Im(o) = Im(a*oldb) – Im(b)

= Im(a)*Re(oldb) + Re(a)*Im(oldb) – Im(b)

csmul1 a,b

csmul2 o,a,b

2 args 2 mul 1 add

3 args 2 mul 3 add

cadd o1,e0,e1

3 args 2 add Re(o) = Re(a) + Re(b)

Im(o) = Im(a) + Im(b)

(26)

2

(27)

4,5

2

2

(28)

4,5

(29)

cmul o0,e0,f0 // 4 MUL,2 ADD csub o0,o0,e1 // 2 ADD

cadd o1,e0,e1 // 2 ADD

o0 = e0*f0 – e1 // 4 MUL, 4 ADD, 4 ARG o1 = e0 + e1 // 2 ADD, 3 ARG

(a+ib)(c+id) = (ac-bd) + i(ad+bc) 4MUL + 2ADD

On whiteboard

4

(30)

k1=c(a+b) k2=a(d-c) k3=b(c+d)

(a+ib)(c+id) = (k1-k3) + i(k1+k2) 3MUL + 5ADD

5 On whiteboard

o0 = e0*f0 – e1 o1 = e0 + e1 3MUL + 9ADD

(31)

k1 = Re(f0)*[Re(e0)+Im(e0)]

k2 = Re(e0)*[Im(z)-Re(z)]

x = k1 - Im(e1) y = k1 - Re(e1)

k3 = Im(e0)*[Re(z)+Im(z)]

Im(o0) = x + k2

Re(o1) = Re(e0) + Re(e1) Im(o1) = Im(e0) + Im(e1)

Re(o0) = y – k3, z=f1

MUL,ADD k ADD Re ADD Im o0 = e0*f0 – e1

o1 = e0 + e1

k1 = Re(f1)*[Re(e2)+Im(e2)]

k2 = Re(e2)*[Im(z)-Re(z)]

k3 = Im(e2)*[Re(z)+Im(z)] y = k1 - Re(e3) x = k1 - Im(e3)

Re(o3) = Re(e2) + Re(e3) Im(o3) = Im(e2) + Im(e3)

Re(o2) = y – k3 Im(o2) = x + k2

5 On whiteboard

z = f0

(32)

5

acc0 e0, f0 // no output acc1 o1,e0, e1

acc2 e0,e1 // no output acc3 o0, e2, f1

acc1 o3,e2,e3

acc2 e2,e3 // no output acc4 o2 // no input

(33)

References

Related documents

Sedan tycker hon också att bilder där hon ser samlad ut är bättre än bilder när hon pratar eller skrattar just för att man inte ser sig själv när man skrattar i vanliga fall

Due to safety reasons two experimental leaders accompanied the participant throughout the drive. One was sitting in the passenger seat next to the driver ready to intervene

individualiseringen lett till att det aktiva medborgarskapet betonas utifrån medborgarens egna skyldigheter och ansvar för ett aktivt deltagande i välfärdstjänsterna. Bakom det

Fluor, klor och svavel var alla mycket små toppar med väldigt låga halter, så den inte alltför bra överensstämmelsen med analysresultaten från det externa företaget, se tabell 12,

Tänk om mina föräldrar en dag skulle komma och säga att nu ska vi flytta till Rosengård då hade jag protesterar och sagt att där kan man ju inte bo… jag håller helt med det

I denna enkätundersökning var det 25 av 171 elever som gick till en kyrka minst en gång i månaden, d.v.s. Detta innebär att underlaget för denna jämförelse är rätt tunt, så

Detta innebär att göra en planering för arbetet och delta från idéstadiet till (nästan) färdig produkt. Om stensättningen blir verklighet eller ej beslutas efter att

Mesoporous silica SBA-15 in the form of 10-30 µm sized sheets with unusually large ordered pores has been synthesized using heptane as a cosolvent in the presence of NH 4 F..