Address Generation Unit
Hardware for addressing modes
AGU : Calculates the addresses for data memories
Inputs:
Decoded Instructions Data from RF
Data in Address Register Output:
Address for Data Memory
Why AGU?
• Memory addressing operations and
compute operations are often independent
– Offload ALU
– Computing and
addressing can be done in parallel!
Sum += DM0[ptr0++] * DM1[ptr1++]
Addressing Modes
Addressing Description Example
Direct A <= K ; constant from instruction ld r0, (K)
Register indirect A <= Reg. ld r0, (r1)
Index A <= ARx + Reg or
A <= ARx + K
ld r0, (ar0, r1) ld r0, (ar1, K) Post incremental or
Post decremental
A <= ARx
ARx <= ARx+1 or ARx<=ARx-1
ld r0, (ar0++) ld r0, (ar0--) Variable step size
post incremental
A <= ARx
ARx <= ARx + STEP
ld r0, [ar0+step]
Post Incremental Modulo addressing
A <= ARx
ARx <= (ARx == Top) ? Bottom : ARx+STEP
ld r0, (ar0++%)
3
A
ARx
Design an AGU 5.1
5.1
Solution 5.1
OP Ca Cb Cc Cd Ce Cf Cg Ch
AR0=RF 0000 1 0 0 0 0 - - -
AR1=RF 0001 0 1 0 0 0 - - -
STEP=RF 0010 0 0 0 0 1 - - -
NOP 0011 0 0 0 0 0 - - -
ADR=IMM 0100 0 0 0 0 0 - 0 -
ADR=RF 0101 0 0 0 0 0 - 1 -
ADR=AR0, AR0+=STEP 0110 2 0 0 0 0 0 2 2
ADR=AR1, AR1+=STEP 0111 0 2 0 0 0 0 2 2
ADR=AR0+IMM 1000 0 0 0 0 0 - 3 3
ADR=AR1+IMM 1001 0 0 0 0 0 - 3 3
ADR=AR0,
AR0=(AR0+1==TOP)?BOTTOM:AR0+1 1010 2 0 0 0 0 1 2 1
ADR=AR1,
AR1=(AR1+1==TOP)?BOTTOM:AR1+1 1011 0 2 0 0 0 1 2 1
NOP 1100 0 0 0 0 0 - - -
NOP 1101 0 0 0 0 0 - - -
BOTTOM=RF 1110 0 0 1 0 0 - - -
TOP=RF 1111 0 0 0 1 0 - - -
5.2
5.4
Exercise 5.4 (challenging)
Same as exercise 5.2, except that you also have these constraints
5.2
Solution 5.2
DM0 DM1
Solution 5.2
load r0, DM1[DM0[AR0++]]
store DM1[AR1++],r0
load r0, DM0[AR0++]
load r0, DM1[r0]
store DM1[AR1++],r0
loop 256 loop 256
DM0 DM1 R
R W
Solution 5.4
DM1
r0 DM1
r0
ar1++
DM0
r0
ar0++
N
in one instruction, carefully design for prolog and epilog unrolling more than 10 times
repeat 32
load r0, dm0[ar0++];
load r1, dm1[r0] load r0, dm0[ar0++];
load r2, dm1[r0] load r0, dm0[ar0++];
load r3, dm1[r0] load r0, dm0[ar0++];
load r4, dm1[r0] load r0, dm0[ar0++];
load r5, dm1[r0] load r0, dm0[ar0++];
load r6, dm1[r0] load r0, dm0[ar0++];
load r7, dm1[r0] load r0, dm0[ar0++];
load r8, dm1[r0];
Store dm1[ar1++], r1;
Store dm1[ar1++], r2;
Store dm1[ar1++], r3;
Store dm1[ar1++], r4;
Store dm1[ar1++], r5;
Store dm1[ar1++], r6;
Store dm1[ar1++], r7;
Store dm1[ar1++], r8;
(16+2)*(256/8) = 576
load r0, DM0[AR0++]
load r0, DM1[r0]
loadstore r0,DM0[AR0++], DM1[AR1++],r0
load r0, DM1[r0]
store DM1[AR1++],r0
loop 255 prologue
epilogue
DM0 DM1 R
R
R W
R W
DM0 DM1
r0 DM1
r0
ar0++ ar1++
DM0
r0
ar0++
DM1
r0 DM1
r0
ar1++