3.MAC Design

(1)

3.MAC Design

Dake Liu (dake@isy.liu.se) Olle Seger (olle.seger@liu.se)

Andreas Ehliar (ehliar) Jian Wang

•MAC Introduction

•Multiplier

•MAC Design : A Case Study

•Exercises

(2)

MAC

 Most important HW module in DP of any DSP Processor

 Supports Algorithms like

 Convolution (most frequently used in DSP Algorithms)

• Filtering, FIR, IIR, Auto-Correlation, Cross-Correlation

 Transforms (FFT, DCT etc)

 Double Precision Arithmetic Operations

MAC Building Blocks

• Multiplier

• Accumulator

• ACR Registers

• Multiplexers

• Functions (e.g. Rounding, Scaling, Saturation, Flags etc)

(3)

We use only Signed Multipliers

Unsigned +1 = 1.000 is extended to 01.000 Signed -1 = 1.000 is extended to 11.000

SIGNED MULT (N+1) bits

A

2N+2

U/Sig n

P

B

U/Sig nN+1 N+1

N N

-1 x -1 (N+1 bit Mult)

11.000 (1) 11.000 (1)

0000.000000 0000.00000 0000.0000

1111.000 (1) 0010.00 (+2)

0001.000000 (+1)

Correct; Sat. will work +1 x +1 (N+1 bit Mult)

01.000 (+1) 01.000 (+1)

0000.000000 0000.00000 0000.0000

0001.000 (+1) 0000.00

0001.000000 (+1)

Correct; Sat. will work

(4)

MAC Design: A case study

• Functions

– Integer / Fractional multiplication (16 x 16) – Signed / Unsigned multiplication

– Convolution with 8 guards and initialization – Round

– Saturation

– 32 bits Long Plus and Minus – 32 bits Long operation: ABS

(5)

MAC Design –

^Start

MUL ACR

”0”

OMA[15:0]

OMB[15:0] ACR keeper function

Accumulation feedback Here: ACR is Accumulation

register. ACR is 32 bits according to SPEC.

”0” gives initialization to ACR for a convolution

(6)

MAC Design –

Guard at MUL-out

MUL ACR

”0”

OMA[15:0]

OMB[15:0]

Accumulation feedback Is 40 bits now!

G8 at MUL output is:

out[39:32] = {8{in[31]}}

32 40

40

8G 40

(7)

MAC Design –

Input MUX for MUL

MUL ACR

”0”

OMA[15:0]

OMB[15:0]

MUL Input from Register File or Mem.

32 40

40

8G 40

REG

RF-Port1

Mem1 OMA

REG

RF-Port2

Mem2 OMB

(8)

MAC Design –

Load to ACR

MUL

REG

RF-Port1 Mem1

OMB

REG

RF-Port2

Mem2 ACRH

”0”

ACRL

Guard8 following ”0”

OMA and OMB is:

out[23:16] = {8{in[16]}}

Double load to ACR

15:0 39:16 23:0

23:0

15:0 15:0

Merge {H[23:0], L[15:0]}

40

16 24

OMA G8 G8

G8

(9)

MAC Design –

ACR1 and ACR2 are used

MUL

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2

Mem2 G8

ACR1L ACR1H

ACR2L ACR2H

G8

Two accumulator registers are

used for MAC 16

16 24 24

40 40 40

40

40 40

40

G8

16

G8G8

(10)

MAC Design –

Signed and Unsigned

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2 Mem2

Guard 8

ACR1L ACR1H

ACR2L ACR2H 16

16

16 24 24

40 40 40

40

40 40

40

MUL

17 17

Guard 6

Guard 8 U/Sig nU/Sig n

Signed

A,B = {[15],[15:0]}

Unsigned

A,B = {1’b0,[15:0]}

34

(11)

MAC Design –

Integer and Fractional

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2

Mem2 G8

ACR1L ACR1H

ACR2L ACR2H

I: {6’b[33],[33:0]}

F: {5’b[33],[33,0],0}

16

MUL

17 17

G8 U/Sig nU/Sig n I/F/G6

(12)

I/F?

ACRH ACRL

Before MUL (2,15) After MUL (4,30) 0

GUARD

REG

After I/F (9,31) In ACR (9,31) Nr of bits

1

REG

1 15

15 2

4 30

30 4

5

In REG (1,15)

15 16 8 1

Where is the binary point?

(13)

Rounding

Rounding-vector {[39:16], 16’b0}+

{23’b0, [15], 16’b0}

or

{[39:15], 15’b0}+

{24’b0, 1, 15’b0}

X

0 … 0 0 … 0

0 … 0 ^X

0 … 0 0 … 0

X

0 … 0

1

1 bit, value=1

16 16 8

(14)

MAC Design –

Round for MUL, ACR1, ACR2

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2 Mem2

Guard 8

ACR1L ACR1H

ACR2L ACR2H

OAA OAB

”0”

40 16

MUL

17 17

Guard 8 U/Sig nU/Sig n RND

I/F/G6

Rounding-vector {[39:16], 16’b0}

+{23’b0, [15], 16’b0}

or

{[39:15], 15’b0}

+{24’b0, 1’b1,

15’b0} ¹⁴

(15)

MAC Design –

^{Long Plus}

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2 Mem2

Guard 8

ACR1L ACR1H

ACR2L ACR2H

OAA OAB

”0”

40 16

MUL

17 17

Guard 8 U/Sig nU/Sig n RND

I/F/G8

Pass, no round

Long ADD

ACR1= ACR1 + ACR2

(16)

MAC Design –

^Long

Minus

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2 Mem2

Guard 8

ACR1L ACR1H

ACR2L ACR2H

OAA

”0”

ADD:{[39:0],0}

SUB:{[39:0],1}

Long SUB

ACR1=ACR1 - ACR2

40 41 16

MUL

17 17

Guard 8 U/Sig n RND

U/Sig n

ADD: {[39:0],0}

SUB:

{inv[39:0],1}

OAB

I/F/G8

40

(17)

MAC Design –

^{Long ABS}

RND

I/F

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2 Mem2

Guard 8

ACR1L ACR1H

ACR2L ACR2H

OAA

”0”

40 16

MUL

17 17

{[39:0],0} ADD {[39:0],1} NEG {39’b0, [39],0}

If (ACR1[39]==1) then ACR1 = {INV(ACR1),0} + {39’b0, [39],0};

OAB

ABSC

41 bit

(18)

MAC Design –

^Saturation

RND

I/F

OMA

REG

RF-Port1 Mem1

OMB

REG

RF-Port2 Mem2

Guard 8

ACR1L ACR1H

ACR2L ACR2H

OAA

”0”

40

16

MUL

17 17

18

OAB

ABSC

Saturatio n

If guards <>[39]

If [39]==0

Saturation = {9’b0, 31’b1}

Else

Saturation = {9’b1, 31’b0}

Else

Saturation = result [39:0]

41 bit

(19)

1

RF

(20)

1

(21)

2

(22)

2

(23)

2

(24)

csmul1 e0,f0 // x=Re(e0*f0),y=f0

csmul2 o0,e0,e1 // Re(o0)=x-Re(e1), Im(o0)=Im(e0*y)-Im(e1) cadd o1,e0,e1 // Re(o1)=Re(e0)+Re(e1),Im(o1)=Im(e0)+Im(e1)

o0 = e0*f0 – e1 // 4 MUL, 4 ADD, 4 ARG o1 = e0 + e1 // 2 ADD, 3 ARG

(a+ib)(c+id) = (ac-bd) + i(ad+bc) 4MUL + 2ADD

On whiteboard

(25)

Instructions

tmp = Re(a*b)

= Re(a)*Re(b) – Im(a)*Im(b) oldb = b

Re(o) = tmp – Re(b)

Im(o) = Im(a*oldb) – Im(b)

= Im(a)*Re(oldb) + Re(a)*Im(oldb) – Im(b)

csmul1 a,b

csmul2 o,a,b

2 args 2 mul 1 add

3 args 2 mul 3 add

cadd o1,e0,e1

3 args 2 add Re(o) = Re(a) + Re(b)

Im(o) = Im(a) + Im(b)

(26)

2

(27)

4,5

2

(28)

4,5

(29)

cmul o0,e0,f0 // 4 MUL,2 ADD csub o0,o0,e1 // 2 ADD

cadd o1,e0,e1 // 2 ADD

o0 = e0*f0 – e1 // 4 MUL, 4 ADD, 4 ARG o1 = e0 + e1 // 2 ADD, 3 ARG

(a+ib)(c+id) = (ac-bd) + i(ad+bc) 4MUL + 2ADD

On whiteboard

4

(30)

k1=c(a+b) k2=a(d-c) k3=b(c+d)

(a+ib)(c+id) = (k1-k3) + i(k1+k2) 3MUL + 5ADD

5 On whiteboard

o0 = e0*f0 – e1 o1 = e0 + e1 3MUL + 9ADD

(31)

k1 = Re(f0)*[Re(e0)+Im(e0)]

k2 = Re(e0)*[Im(z)-Re(z)]

x = k1 - Im(e1) y = k1 - Re(e1)

k3 = Im(e0)*[Re(z)+Im(z)]

Im(o0) = x + k2

Re(o1) = Re(e0) + Re(e1) Im(o1) = Im(e0) + Im(e1)

Re(o0) = y – k3, z=f1

MUL,ADD k ADD Re ADD Im o0 = e0*f0 – e1

o1 = e0 + e1

k1 = Re(f1)*[Re(e2)+Im(e2)]

k2 = Re(e2)*[Im(z)-Re(z)]

k3 = Im(e2)*[Re(z)+Im(z)] y = k1 - Re(e3) x = k1 - Im(e3)

Re(o3) = Re(e2) + Re(e3) Im(o3) = Im(e2) + Im(e3)

Re(o2) = y – k3 Im(o2) = x + k2

5 On whiteboard

z = f0

(32)

5

acc0 e0, f0 // no output acc1 o1,e0, e1

acc2 e0,e1 // no output acc3 o0, e2, f1

acc1 o3,e2,e3

acc2 e2,e3 // no output acc4 o2 // no input

(33)