• No results found

Solution proposal for the TSEA26 exam on 2012-10-26 (v1.0)

N/A
N/A
Protected

Academic year: 2021

Share "Solution proposal for the TSEA26 exam on 2012-10-26 (v1.0)"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Andreas Ehliar November 13, 2012

Solution proposal for question 1

0 1

x 0

1

0 1

Cw && (x == 1) Cw && (x == 0)

ARx[15:0]

AR0AR1

ARx

AR_WB

Address registers and modulo addressing registers

0 1 1 -1

0

1 TO_DM[15:0]

ARx 0

1 Ctop

TOP

From_RF 0

1 Csel

From_RF

0 1 Cbot

BOT

From_RF 0

1 AR_WB BOT

ARx

TOP =

0 1 0 Cstep

Cmod Cpost

Address calculation/address output

Control table

ARx++

--ARx ARx%++

Load ARx Load BOT Load TOP

CwCstepCsel Cpost Cmod Ctop Cbot 1

1 1 1 0 0

0 1 0 - - -

0 0 0 1 - -

1 0 1 - - -

0 0 1 - - -

0 0 0 0 1 0

0 1 0 0 0 1 Operation

NOTE: x is an input from the instruction decoder and determines whether AR0 or AR1 is used.

(2)

Solution proposal for question 2

{B[15],B}

{1'b0,B}

0 1 2 3 0

{B[15],B}

0 1 {A[15],A}

0 1 2 0 A[15]

Ca

Cb

17 17

17

0 1 2 0 1

Cc

0 1 2 SAT

/2

Cpostop

Control table:

Operation Ca Cb Cc Cpostop A + B 0 1 1 0 A - B 0 3 2 0 SAT(A + B) 0 1 1 1 SAT(A - B) 0 3 2 1 (O + B) /2 0 1 1 2 SAT(|A|) 1 0 0 1

|A| (unsigned result) 1 0 0 0 SAT(unsigned(B)-|A|) 2 2 0 1

RESULT[15:0]

TRUNC

// SAT

always @* begin case(out[16:15])

2'b00: out = in[15:0];

2'b01: out = 16'h7fff;

2'b10: out = 16'h8000;

2'b11: out = in[15:0];

endcase end

// TRUNC

assign out[15:0] = in[15:0]

// /2

assign out[15:0] = in[16:1];

a) S(|A| − |B|) and S(|A − B|) cannot be implemented in a single clock cycle when only one adder is allowed.

b) There are many ways to implement these operations, but the most straight forward way is, as hinted in the exam, to implement a 17 bit wide temporary register that can hold intermediate results. For example:

SAT(|A-B|):

TMP = A - B // New instruction 1 RES = SAT(ABS(TMP)) // New instruction 2 SAT(|A|-|B|):

TMP = ABS(A) // New instruction 3 RES = SAT(TMP - ABS(B)) // New instruction 4

(3)

r0 = SAT(ABS(r0)) SAT(|A|-|B|):

r0 = ABS(r1) // Note that no SAT is used. However, if we assume that r0 // is unsigned this will work.

r1 = SAT(unsigned(r1)-ABS(B)) // This operation assumes that r1 is unsigned For the first variant a short motivation is in order to demonstrate that SAT(|x|) =

SAT(|SAT(x)|): (We assume that we saturate to a 16 bit two’s complement number below.)

Case 1: x > 32767 SAT(|x|) = 32767

SAT(|SAT(x)|) = SAT(|32767|) = 32767 // Correct Case 2: -32768 < x <= 32767

SAT(|x|) = |x|

SAT(|SAT(x)|) = SAT(|x|) = |x|

Case 3: x = -32768

SAT(|x|) = SAT(|-32768|) = SAT(32768) = 32767 SAT(|SAT(x)|) = SAT(|-32768|) = SAT(32768) = 32767 Case 4: x < -32768

SAT(|x|) = 32767

SAT(|SAT(x)|) = SAT(|-32768|) = SAT(32768) = 32767

I also saw a few variants on these two operations that I liked. For example the following is a pretty neat implementation of S(|A| − |B|):

What we want to do:

RESULT = SAT(NEGINV(OpA) + NEGINV(OpB) + SIGN(OpA) + SIGN(OpB)) // How we do it

RESULT = SAT(NEGINV(OpA) + NEGINV(OpB) + SIGN(OpA)) // Only needs one adder as

// SIGN(OpA) is the carry input.

RESULT = SAT(RESULT + SIGN(OpB))

NEGINV: A function that inverts all bit if the number is negative Proof: If x and y are positive the following holds:

SAT(x+y) = SAT(SAT(x)+y)

(4)

Solution proposal for question 3

C2

0 1 C3

0 1 2

0 1

0 1 2 3

PC

1

IMM[15:0]

Z C1

To_PM[15:0]

Control table:

Operation C1 C2 C3 PC++ 2 0 0 BNE 1 0 0 JSR 0 1 1 RTS 3 2 -

Note that we should ensure that we either push PC+1 or add one to the value that we pop to ensure that we don’t run an instruction close to the JSR instruction twice.

b)

function1:

set r15,#32 move ar0,r1 move ar1,r2 clear acr0 add r15,#-1 loop:

bne loop

mac acr0, DM0[ar0++],DM1[ar1++] // Delay slot

add r15,#-1 // Delay slot

rts

satrnd r0,acr0 // Delay slot

(5)

using only three multipliers .

cplx_dotproduct:

move ar0, r0 move ar1, r1 clr acr0 clr acr1

repeat 30, endloop

cplx_mac DM0[ar0++], DM1[ar1++]

endloop:

sat acr0 sat acr1

move r0,HIGH(acr0) move r1,HIGH(acr1)

ret // Ignoring delay slots

matrix_x_vectors:

move ar0,r4 move ar1,r5 move C,r2 move D,r3 clr acr0 clr acr1

move r2, #0x8000 move LOW(acr0),r2 move LOW(acr1),r2 repeat 128, endloop2

mat_x_vec r0,r1,DM0[ar0++], DM1[ar1++]

endloop2:

ret // Ignoring delay slots

1This should be fairly easy for cplx dotproduct(). However, matrix x vectors() is a rather more interesting challenge. . .

(6)

savestate:

move ar0,r0 // Assume ptr points to a suitable location in DM0 read r1,LOW(acr0)

read r2,HIGH(acr0) read r3,GUARDS(acr0) read r4,LOW(acr1) read r5,HIGH(acr1) read r6,GUARDS(acr1) read r7,C

read r8,D

store DM0[ar0++],r1 store DM0[ar0++],r2

...

store DM0[ar0++],r8

ret // Ignoring delay slots

restorestate:

move ar0,r0 // Assume ptr points to a suitable location in DM0 move r1,DM0[ar0++]

move r2,DM0[ar0++]

...

move r8,DM0[ar0++]

set LOW(acr0),r1 set HIGH(acr0),r2 set GUARDS(acr0),r3 set LOW(acr1),r4 set HIGH(acr1),r5 set GUARDS(acr1),r6 set C,r7

set D,r8

ret // Ignoring delay slots

(7)

0 1 C

0 1 0 1 DM0[15:0]

DM0[31:16]

OpA

OpB DM1[15:0]

DM1[31:16]

DM1[15:0]

Cmode Cmode

0 1 Cmode G

G G

G40

ACR0

Cfract Cfract Cfract

0 1 2 3 {OpA[7:0],ACR1[31:0]}

{ACR1[39:32],OpA,ACR1[15:0]}

{ACR1[39:16],OpA}

ASAT1OUT 2

3 0

C2

0 1

0

Cw && (x == 1)

0 1 2 3 4 ACRx[31:16]

ACRx[15:0]

C D {8'b0, ACRx[39:32]}

C3

Note: x is used to select whether ACR0 or ACR1 is used.

To_RF Control table

Operation Cmode Cfract Csatmode C1 C2 C3 Cw Cc Cd nop - - - - - - 0 0 0 clr ACRx - - - - 2 - 1 0 0 mat_x_vec 0 1 0 - - - 0 0 0 cplx_mult 1 0 - - 1 - 1 0 0 sat ACRx - - 1 0 3 - 1 0 0 set LOW(ACRx) - - - 3 3 - 1 0 0 set HIGH(ACRx) - - - 2 3 - 1 0 0 set GUARD(ACRx) - - - 1 3 - 1 0 0 set C - - - - - - 0 1 0 set D - - - - - - 0 0 1 read LOW(ACRx) - - - - - 1 0 0 0 read HIGH(ACRx) - - - - - 0 0 0 0 read GUARDS(ACRx) - - - - - 2 0 0 0 read C - - - - - 3 0 0 0 read D - - - - - 4 0 0 0

0 1 2 3 {OpA[7:0],ACR0[31:0]}

{ACR0[39:32],OpA,ACR1[15:0]}

{ACR0[39:16],OpA}

C1 ACR0

SAT

SAT0OUT

0 1 2 3 0

C2

0 1

0

Cw && (x == 0) 0

1 ACR0

Csatmode

0 OpA 1 D

Cd

0 OpA 1 C

Cc

0 1 ACR0

ACR1 ACRx

x

Solution proposal for question 5

• See the textbook.

• There are many possible answers here. One possible advantage is the reduced development time as the floating point hardware will handle all scaling typically required for fixed point computations. A possible disadvantage is of course the increased development cost.

• See the textbook.

(8)

Statistics

• Grade U: 4

• Grade 3: 8

• Grade 4: 4

• Grade 5: 4 (Best score: 45)

References

Related documents

To illustrate how profit is not the best means of making a new hospital, Paul Farmer contrasts a private finance hospital construction in the city of Maseru in Lesotho with

(If you use less than 8 guard bits you would need to motivate this.).. This means that it is more important to minimize the delay of signals connected to the output of a memory than

(There is no need to modify the AGU unit as the addressing used in this code snippet are standard post increment addressing modes that any normal DSP processor would

2 I don’t deduct any points for using two accumulators however, even though the exercise can be solved using only one accumulator.. • More than two read ports from the register file

The reason that I call it modulo-like instead of modulo addressing is that the C-code in the original exercise is not a true modulo addressing mode with variable step size.. The

None of the solutions above used this fact, but it could allow us to solve the exercise in different ways such as the following solution which allows us to avoid extra memory

Some students also used a 17 bit wide multiplier instead of a 16 bit wide multiplier, even though no instruction actually needed these extra bits (or the ability to change

The main focus of this paper will be to prove the Brouwer fixed-point theorem, then apply it in the context of a simple general equilibrium model in order to prove the existence of