3.MAC Design
Dake Liu (dake@isy.liu.se) Olle Seger (olle.seger@liu.se)
Andreas Ehliar (ehliar) Jian Wang
•MAC Introduction
•Multiplier
•MAC Design : A Case Study
•Exercises
MAC
Most important HW module in DP of any DSP Processor
Supports Algorithms like
Convolution (most frequently used in DSP Algorithms)
• Filtering, FIR, IIR, Auto-Correlation, Cross-Correlation
Transforms (FFT, DCT etc)
Double Precision Arithmetic Operations
MAC Building Blocks
• Multiplier
• Accumulator
• ACR Registers
• Multiplexers
• Functions (e.g. Rounding, Scaling, Saturation, Flags etc)
We use only Signed Multipliers
Unsigned +1 = 1.000 is extended to 01.000 Signed -1 = 1.000 is extended to 11.000
SIGNED MULT (N+1) bits
A
2N+2
U/Sig n
P
B
U/Sig nN+1 N+1
N N
-1 x -1 (N+1 bit Mult)
11.000 (1) 11.000 (1)
0000.000000 0000.00000 0000.0000
1111.000 (1) 0010.00 (+2)
0001.000000 (+1)
Correct; Sat. will work +1 x +1 (N+1 bit Mult)
01.000 (+1) 01.000 (+1)
0000.000000 0000.00000 0000.0000
0001.000 (+1) 0000.00
0001.000000 (+1)
Correct; Sat. will work
MAC Design: A case study
• Functions
– Integer / Fractional multiplication (16 x 16) – Signed / Unsigned multiplication
– Convolution with 8 guards and initialization – Round
– Saturation
– 32 bits Long Plus and Minus – 32 bits Long operation: ABS
MAC Design –
StartMUL ACR
”0”
OMA[15:0]
OMB[15:0] ACR keeper function
Accumulation feedback Here: ACR is Accumulation
register. ACR is 32 bits according to SPEC.
”0” gives initialization to ACR for a convolution
MAC Design –
Guard at MUL-outMUL ACR
”0”
OMA[15:0]
OMB[15:0]
Accumulation feedback Is 40 bits now!
G8 at MUL output is:
out[39:32] = {8{in[31]}}
32 40
40
8G 40
MAC Design –
Input MUX for MULMUL ACR
”0”
OMA[15:0]
OMB[15:0]
MUL Input from Register File or Mem.
32 40
40
8G 40
REG
RF-Port1
Mem1 OMA
REG
RF-Port2
Mem2 OMB
MAC Design –
Load to ACRMUL
REG
RF-Port1 Mem1
OMB
REG
RF-Port2
Mem2 ACRH
”0”
ACRL
Guard8 following ”0”
OMA and OMB is:
out[23:16] = {8{in[16]}}
Double load to ACR
15:0 39:16 23:0
23:0
15:0 15:0
Merge {H[23:0], L[15:0]}
40
16 24
OMA G8 G8
G8
MAC Design –
ACR1 and ACR2 are usedMUL
OMA
REG
RF-Port1 Mem1
OMB
REG
RF-Port2
Mem2 G8
ACR1L ACR1H
ACR2L ACR2H
G8
Two accumulator registers are
used for MAC 16
16 24 24
40 40 40
40
40 40
40
40
G8
16
G8G8
MAC Design –
Signed and UnsignedOMA
REG
RF-Port1 Mem1
OMB
REG
RF-Port2 Mem2
Guard 8
ACR1L ACR1H
ACR2L ACR2H 16
16
16
16 24 24
40 40 40
40
40 40
40
40
MUL
17 17
Guard 6
Guard 8 U/Sig nU/Sig n
Signed
A,B = {[15],[15:0]}
Unsigned
A,B = {1’b0,[15:0]}
34
MAC Design –
Integer and FractionalOMA
REG
RF-Port1 Mem1
OMB
REG
RF-Port2
Mem2 G8
ACR1L ACR1H
ACR2L ACR2H
I: {6’b[33],[33:0]}
F: {5’b[33],[33,0],0}
16
MUL
17 17
G8 U/Sig nU/Sig n I/F/G6
I/F?
ACRH ACRL
Before MUL (2,15) After MUL (4,30) 0
GUARD
REG
After I/F (9,31) In ACR (9,31) Nr of bits
1
REG
1 15
15 2
4 30
30 4
5
In REG (1,15)
15 16 8 1
Where is the binary point?
Rounding
Rounding-vector {[39:16], 16’b0}+
{23’b0, [15], 16’b0}
or
{[39:15], 15’b0}+
{24’b0, 1, 15’b0}
X
0 … 0 0 … 0
0 … 0 X
0 … 0 0 … 0
X
0 … 0
1
1 bit, value=1
16 16 8
MAC Design –
Round for MUL, ACR1, ACR2OMA
REG
RF-Port1 Mem1
OMB
REG
RF-Port2 Mem2
Guard 8
ACR1L ACR1H
ACR2L ACR2H
OAA OAB
”0”
”0”
40 16
MUL
17 17
Guard 8 U/Sig nU/Sig n RND
I/F/G6
Rounding-vector {[39:16], 16’b0}
+{23’b0, [15], 16’b0}
or
{[39:15], 15’b0}
+{24’b0, 1’b1,
15’b0} 14
MAC Design –
Long PlusOMA
REG
RF-Port1 Mem1
OMB
REG
RF-Port2 Mem2
Guard 8
ACR1L ACR1H
ACR2L ACR2H
OAA OAB
”0”
”0”
40 16
MUL
17 17
Guard 8 U/Sig nU/Sig n RND
I/F/G8
Pass, no round
Long ADD
ACR1= ACR1 + ACR2
MAC Design –
LongMinus
OMA
REG
RF-Port1 Mem1
OMB
REG
RF-Port2 Mem2
Guard 8
ACR1L ACR1H
ACR2L ACR2H
OAA
”0”
”0”
ADD:{[39:0],0}
SUB:{[39:0],1}
Long SUB
ACR1=ACR1 - ACR2
40 41 16
MUL
17 17
Guard 8 U/Sig n RND
U/Sig n
ADD: {[39:0],0}
SUB:
{inv[39:0],1}
OAB
I/F/G8
40
MAC Design –
Long ABSRND
I/F
OMA
REG
RF-Port1 Mem1
OMB
REG
RF-Port2 Mem2
Guard 8
ACR1L ACR1H
ACR2L ACR2H
OAA
”0”
”0”
40 16
MUL
17 17
Guard 8 U/Sig nU/Sig n
{[39:0],0} ADD {[39:0],1} NEG {39’b0, [39],0}
If (ACR1[39]==1) then ACR1 = {INV(ACR1),0} + {39’b0, [39],0};
OAB
ABSC
41 bit
MAC Design –
SaturationRND
I/F
OMA
REG
RF-Port1 Mem1
OMB
REG
RF-Port2 Mem2
Guard 8
ACR1L ACR1H
ACR2L ACR2H
OAA
”0”
”0”
40
16
MUL
17 17
Guard 8 U/Sig nU/Sig n
18
OAB
ABSC
Saturatio n
If guards <>[39]
If [39]==0
Saturation = {9’b0, 31’b1}
Else
Saturation = {9’b1, 31’b0}
Else
Saturation = result [39:0]
41 bit
1
RF
1
2
2
2
csmul1 e0,f0 // x=Re(e0*f0),y=f0
csmul2 o0,e0,e1 // Re(o0)=x-Re(e1), Im(o0)=Im(e0*y)-Im(e1) cadd o1,e0,e1 // Re(o1)=Re(e0)+Re(e1),Im(o1)=Im(e0)+Im(e1)
o0 = e0*f0 – e1 // 4 MUL, 4 ADD, 4 ARG o1 = e0 + e1 // 2 ADD, 3 ARG
(a+ib)(c+id) = (ac-bd) + i(ad+bc) 4MUL + 2ADD
On whiteboard
Instructions
tmp = Re(a*b)
= Re(a)*Re(b) – Im(a)*Im(b) oldb = b
Re(o) = tmp – Re(b)
Im(o) = Im(a*oldb) – Im(b)
= Im(a)*Re(oldb) + Re(a)*Im(oldb) – Im(b)
csmul1 a,b
csmul2 o,a,b
2 args 2 mul 1 add
3 args 2 mul 3 add
cadd o1,e0,e1
3 args 2 add Re(o) = Re(a) + Re(b)
Im(o) = Im(a) + Im(b)
2
4,5
2
2
4,5
cmul o0,e0,f0 // 4 MUL,2 ADD csub o0,o0,e1 // 2 ADD
cadd o1,e0,e1 // 2 ADD
o0 = e0*f0 – e1 // 4 MUL, 4 ADD, 4 ARG o1 = e0 + e1 // 2 ADD, 3 ARG
(a+ib)(c+id) = (ac-bd) + i(ad+bc) 4MUL + 2ADD
On whiteboard
4
k1=c(a+b) k2=a(d-c) k3=b(c+d)
(a+ib)(c+id) = (k1-k3) + i(k1+k2) 3MUL + 5ADD
5 On whiteboard
o0 = e0*f0 – e1 o1 = e0 + e1 3MUL + 9ADD
k1 = Re(f0)*[Re(e0)+Im(e0)]
k2 = Re(e0)*[Im(z)-Re(z)]
x = k1 - Im(e1) y = k1 - Re(e1)
k3 = Im(e0)*[Re(z)+Im(z)]
Im(o0) = x + k2
Re(o1) = Re(e0) + Re(e1) Im(o1) = Im(e0) + Im(e1)
Re(o0) = y – k3, z=f1
MUL,ADD k ADD Re ADD Im o0 = e0*f0 – e1
o1 = e0 + e1
k1 = Re(f1)*[Re(e2)+Im(e2)]
k2 = Re(e2)*[Im(z)-Re(z)]
k3 = Im(e2)*[Re(z)+Im(z)] y = k1 - Re(e3) x = k1 - Im(e3)
Re(o3) = Re(e2) + Re(e3) Im(o3) = Im(e2) + Im(e3)
Re(o2) = y – k3 Im(o2) = x + k2
5 On whiteboard
z = f0
5
acc0 e0, f0 // no output acc1 o1,e0, e1
acc2 e0,e1 // no output acc3 o0, e2, f1
acc1 o3,e2,e3
acc2 e2,e3 // no output acc4 o2 // no input