• No results found

Program Flow Control Units

N/A
N/A
Protected

Academic year: 2021

Share "Program Flow Control Units"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

4. Design of

Program Flow Control Units

Olle Seger Andreas Ehliar

Jian Wang

Andréas Karlsson

Oscar Gustafsson

(2)

Program Flow Control Unit

• What instruction to execute next?

• Easy! If it weren’t for...

– Hardware repeat (simple to handle) – Unconditional jumps

– Conditional jumps – Call/return

– Program counter as destination operand

– ...

• Can be handled in many ways:

– Flush pipeline – Use delay slots – ...

IR3

(3)

Speculative techniques

• Branch prediction

– ”Guess” the outcome of a branch

– Static techniques (based on heuristics) – Dynamic techniques (based on history) – Specialized predictors, e.g. Loop predictor

• Branch target prediction

– ”Guess” the branch target before it is known/calculated

• Return address buffering

– Keep a small stack of return addresses in faster registers (in addition to the stack in memory)

• (Not part of this course)

(4)

Finite State Machines

• Moore • Mealy

State

F G

Input Output Input F State G Output

Which should we use for our program flow control unit?

(5)

Finite State Machines

• Let’s us immediately react to incoming flow control instruction!

• Beware of combinatorial loops (feedback paths from output to input)...

• Mealy!!!

State

F G

Input Output

(6)

A pipelined CPU

• similar (but not identical) to Senior

• PC FSM controls the jump instr.

Inputs:

jmpdec = decoded instruction (what kind, nr of DS) JT = jump taken?

(deduced from IR2,flags) Outputs:

nextPC = targetaddress, ++, stall forcenop = insert HW nop

Implement:

jmp ds0/ds1 addr jmp.eq ds0/ds1 addr Let’s try this program

0: add (sets flags) 1: jmp.eq ds0 5

2: xxx

IR3

JT

add 1

++

no nop

-

(7)

IR3

JT

add

jmp.eq ds0 5 2

stall

nop

-

IR3

JT

2

jmp.eq ds0 5

add nop

stall

nop

- -

jmp!

(8)

IR3

JT

add

jmp.eq 5 2

nop nop

TA

nop

1

IR3

JT

add

jmp.eq 5 2

nop nop

++

no

nop 0

- -

Jump taken! Jump NOT taken!

(9)

0: add

1: jmp.eq ds0 5

2: xxx

3: yyy

4: zzz

5: and

6: www

PC IR IR1 IR2 IR3 ...

0

1 add

2 jmp add

2 nop jmp add

2 nop nop jmp add

5 nop nop nop jmp

6 and nop nop nop

PC IR IR1 IR2 IR3 ...

0

1 add

2 jmp add

2 nop jmp add

2 nop nop jmp add

3 xxx nop nop jmp

4 yyy xxx nop nop

Pipeline diagram

ds0

(10)

State graph for ds0

0

1 2

ds0(nop,stall PC)

-(nop,stall PC)

JT(nop,PC=Target) JT(PC++)

3

-(PC++)

jmp (PC++ )

jmp in

IR1 jmp in

IR2

jmp in

IR3

(11)

0: add

1: jmp.eq ds1 5

2: xxx

3: yyy

4: zzz

5: and

6: www

PC IR IR1 IR2 IR3 ...

0

1 add

2 jmp add

3 xxx jmp add

3 nop xxx jmp add

5 nop nop xxx jmp

6 and nop nop xxx

PC IR IR1 IR2 IR3 ...

0

1 add

2 jmp add

3 xxx jmp add

3 nop xxx jmp add

4 yyy nop xxx jmp

5 zzz yyy nop xxx

Pipeline diagram

ds1

(12)

0

1 2

ds0(nop,stall PC)

-(nop,stall PC)

JT(nop,PC=Target) JT(PC++)

3

-(PC++) 4

jmp (PC++ ) ds1 (PC++ )

-(nop,stall PC)

jmp in

IR1 jmp in

IR2

jmp in IR3

State graph for ds0 and ds1

(13)

State graph (simplified)

0

1 2

ds0(nop,stall PC) -(nop,stall PC)

JT(nop,PC=Target) JT(PC++)

jmp(PC++)

ds1(PC++)

(14)

Exercises!

(15)

4.1

(16)

4.1

(17)

0: jmp 5 1: xxx 2: yyy 3:

4:

5: zzz

jmp 5 1

5

4.1a

xxx will be executed

1 delay slot

(18)

add r3,r0,r1 nop

add r3,r2,r3 add r3,r0,r1

nop

add r3,r2,r3

4.1b

”writeback to RF” is inside Register file

=> 1 nop

(19)

xxx

jmp.lt yyy

5

4.1c

0:set r2,10 1:nop

2:jump.lt r0,r2,20 3:xxx ; Delay slot 1 4:yyy ; Delay slot 2 5:zzz ; Delay slot 3

… 20:

20

(20)

4.1c

set r2, 10

nop ; Wait for write-back jump.lt r0, r2, skip

nop ; Delay slot 1 nop ; Delay slot 2 nop ; Delay slot 3 set r3, 55

nop ; Wait for write-back add r0, r0, r3

jump.eq r14, r14, endprog ; No need for unconditional jump nop ; Delay slot 1

nop ; Delay slot 2 nop ; Delay slot 3 skip:

set r3, 48

nop ; Wait for write-back add r0, r0, r3

set r3, 1

nop ; Wait for write-back jump.neq r1, r3, endprog nop ; Delay slot 1

nop ; Delay slot 2 nop ; Delay slot 3 set r3, 32

nop ; Wait for write-back add r0, r0, r3

endprog:

25 instructions

(21)

4.1c

set r2, 10

set r3, 55 ; May not be used, but in case jump.lt r0, r2, skip

nop ; Delay slot 1 nop ; Delay slot 2 nop ; Delay slot 3

jump.eq r14, r14, endprog ; No need for unconditional jump add r0, r0, r3

nop ; Delay slot 2 nop ; Delay slot 3 skip:

set r3, 48

nop ; Wait for write-back add r0, r0, r3

set r3, 1

nop ; Wait for write-back jump.neq r1, r3, endprog set r3, 32

nop ; Delay slot 2 nop ; Delay slot 3 add r0, r0, r3 endprog:

20 instructions

(22)

4.1c

set r2, 10

set r3, 55 ; May not be used, but in case jump.lt r0, r2, skip

nop ; Delay slot 1 nop ; Delay slot 2 nop ; Delay slot 3

jump.eq r14, r14, endprog ; No need for unconditional jump add r0, r0, r3

nop ; Delay slot 2 nop ; Delay slot 3 skip:

set r4, 1 set r3, 48

jump.neq r1, r4, endprog add r0, r0, r3

set r3, 32

nop ; Delay slot 3 add r0, r0, r3

endprog:

17 instructions

(23)

4.1c

set r2, 10 set r3, 55

jump.lt r0, r2, skip set r4, 48

set r5, 1

nop ; Delay slot 3

jump.eq r14, r14, endprog ; No need for unconditional jump add r0, r0, r3

nop ; Delay slot 2 nop ; Delay slot 3 skip:

jump.neq r1, r5, endprog add r0, r0, r4

set r3, 32

nop ; Delay slot 3 add r0, r0, r3 endprog:

15 instructions

(24)

4.1c

set r13, 0 ; Constant 0 set r14, 1 ; Constant 1 set r15, 36 ; Loop counter loop:

ld r2, [r0]

add r0, r0, r14 st [r1], r2

add r1, r1, r14 sub r15, r15, r14

nop ; Wait for write-back jump.neq r15, r13, loop nop ; Delay slot 1

nop ; Delay slot 2 nop ; Delay slot 3

3 + 36 × 10 = 363 cycles

(25)

4.1c

set r13, 0 ; Constant 0 set r14, 1 ; Constant 1 set r15, 36 ; Loop counter loop:

sub r15, r15, r14 ld r2, [r0]

jump.neq r15, r13, loop add r0, r0, r14

st [r1], r2

add r1, r1, r14

3 + 36 × 6 = 201 cycles

(26)

4.1c unfold loop

set r13, 0 ; Constant 0 set r14, 1 ; Constant 1 set r15, 18 ; Loop counter loop:

ld r2, [r0]

add r0, r0, r14 st [r1], r2

add r1, r1, r14 sub r15, r15, r14 ld r2, [r0]

jump.neq r15, r13, loop add r0, r0, r14

st [r1], r2

add r1, r1, r14

3 + 18 × 10 = 183 cycles

(27)

0

1 2

jmp(PC++)

3 jmp = PFC_OP[4]

JT = PFC_OP2[4]*(PC_FSM_EQUAL*(PFC_OP2[1:0]==0) + …)

jmp(PC++)

-(PC++)

JT(PC=PFC_DATA_2) JT(PC++)

-(PC++)

0 JT(PC++)

JT(PC=PFC_DATA_2)

4.1d

PC = JT ? PFC_DATA2 : PC + 1

(28)

+1

1 0

PC

2 1 0

&

PFC_OP[4]

PFC_OP[1:0]

jump decision

4.1d

Program Counter

Module

(29)

4.3

(30)

4.3

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

This result becomes even clearer in the post-treatment period, where we observe that the presence of both universities and research institutes was associated with sales growth

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Av tabellen framgår att det behövs utförlig information om de projekt som genomförs vid instituten. Då Tillväxtanalys ska föreslå en metod som kan visa hur institutens verksamhet

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

a) Inom den regionala utvecklingen betonas allt oftare betydelsen av de kvalitativa faktorerna och kunnandet. En kvalitativ faktor är samarbetet mellan de olika

19 19övre bild: gjutnegativ akustisk takstruktur nedre bild: gjutning akustisk