2006 5
l t 1 t 2 t 3 t 4 I: add r1,r2,r3 J: sub r4,r1,r5
: (Hazard)
: (Hazard)
Instr 1 Instr 2
( ) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Load Ifetch ALU DMem Instr 1 Ifetch ALU DMem Instr 2 Ifetch ALU DMem Instr 3 Ifetch ALU DMem Instr 4
( ) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Load Ifetch ALU DMem Instr 1 Instr 2 Stall Ifetch Ifetch ALU DMem ALU DMem Instr 3 Ifetch ALU DMem
1: ( / ) 2: ( Harvard Architecture )
i j i j RAW j i j WAR j i i WAW j i i
Read After Write (RAW) ( ) Instr J Instr I I: add r1,r2,r3 J: sub r4,r1,r3 RAW
Write After Read (WAR) Instr J Instr I r1 I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7
Write After Write (WAW) Instr J Instr I r1 I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7
r k (r i ) op (r j ) r k (r i ) op (r j ) r m (r k ) op (r n ) RAW Read After Write r i (r k ) op (r j ) r k (r m ) op (r n ) WAR Write After Read r k (r i ) op (r j ) r k (r m ) op (r n ) WAW Write After Write
Range) (Domain) R(i): i D(i): i j i j i RAW R(i) D(j) Ø; WAR D(i) R(j) Ø; WAW R(i) R(j) Ø;
RAW ( ) add r1,r2,r3 sub r4,r1,r3 Ifetch Ifetch ALU DMem ALU DMem and r6,r1,r7 Ifetch ALU DMem or r8,r1,r9 Ifetch ALU DMem xor r10,r1,r11 Ifetch ALU DMem
RAW Ifetch ALU DMem
RAW Load ( ) ld r1, 4(r2) sub r4,r1,r6 and r6,r1,r7 Ifetch Ifetch ALU Ifetch DMem ALU DMem ALU DMem or r8,r1,r9 Ifetch ALU DMem
Load Load RAW ( ) ld r1, 4(r2) sub r4,r1,r6 Ifetch Ifetch ALU DMem Bubble ALU DMem and r6,r1,r7 Ifetch Bubble ALU DMem or r8,r1,r9 Bubble Ifetch ALU DMem
WAR WAW WAW WAR WAW I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7
1 2 3 4 5 6 7 8 I: Fadd r1, r4, r3 J: mul r6, r1, r7 I J J I
IF ID Issue I: Fmul r1,r4,r3 GPR s J: add r1,r2,r3 FPR s K: mul r6,r1,r7 ALU Fadd Fmul WB I: Fdiv r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Fdiv 1 clock 3 clocks
WAR WAW WAR WAW In-order- Issue Out-of-order Complete
DIVD F0, F2, F4 ADDD F10, F0, F8 SUBD F12, F8, F10 Out of Order > DIVD F0, F2, F4 ADDD F10, F0, F8 SUBD F0, F8, F14 MULD F6,F10, F0 WAR WAW DIVD F0, F2, F4 ADDD F10, F0, F8 SUBD F100, F8, F14 MULD F6,F10, F100
(Scoreboard) CDC 6600
CDC6600 (Commit) ( )
CDC6600 WAR WAW WAR WAW
(Issue) (Read operands) (Execution) (Write Result)
CDC6600 FORTRAN 1.7 2.5 6600 ( load/store )
Tomasulo 1. IBM 360/91 (1967 ) 2. 1967 IBM Robert Tomasulo ------- Rrgister Renaming) ISA Instruction Set Architecture) ISA
3. : 4. Tomasulo 90 Alpha 21264 HP 8000 MIPS 10000 Pentium II PowerPC 604
FP Add1 Add2 Add3 Mult1 Mult2 FP FP Common Data Bus (CDB)
Tomasulo & FU Reservation Station WAR WAW RS FU (Common Data Bus) FU
Op: (+ ) Vj, Vk: Qj, Qk: Qj,Qk=0 Busy: FU FU
FP ( ) & ( ) (EX) CDB (WB) CDB : + ( ) 64 + 4 ( )
Tomsulo
I 1 LD f2, 34(r2) 1 I 2 LD f4, 45(r3) I 3 MULTD f6, f4, f2 3 I 4 SUBD f8, f5, f2 1 I 5 DIVD f4, f2, f8 4 I 6 ADDD f10, f6, f4 1 1 (2,1)...... 2 3 4 4 3 5... 5 6 6 1 2 4 3 5 6
4 3 3 2 4 Outof-order Issue Out-of-order Execution RAW,WAR,WAW
I 1 LD f2, 34(r2) 1 I 2 LD f4, 45(r3) I 3 MULTD f6, f4, f2 3 I 4 SUBD f8, f5, f2 1 I 5 DIVD f4, f2, f8 4 I 6 ADDD f10, f6, f4 1 1 2 4 3 1 (2,1)...... 2 3 4 4 3 5... 5 6 6 1 (2,1) 4 4.... 2 3.. 3 5... 5 6 6 5 6
I 1 LD f2, 34(r2) 1 I 2 LD f4, 45(r3) I 3 MULTD f6, f4, f2 3 I 4 SUBD f8, f5, f2 1 I 5 DIVD f4, f2, f8 4 I 6 ADDD f10, f6, f4 1 1 (2,1)...... 2 3 4 4 3 5... 5 6 6 1 (2,1) 4 4 5... 2 3 5. 3 6 6 1 2 4 3 5 6
Tomasulo <=14 <=5 WAR WAW
( ) ---- ----
17 / Taken Not Taken
ILP
for(i=0; i<4k; i++) A[i]=B[i]+C[i]; for(i=0; i<4k; i+=4){ A[i]=B[i]+C[i]; A[i+1]=B[i+1]+C[i+1]; A[i+2]=B[i+2]+C[i+2]; A[i+3]=B[i+3]+C[i+3]; };
SUB R4, R5, R6 ADD R1, R2, R3 if R1=0 then ADD R1, R2, R3 if R1=0 then SUB R4, R5, R6
> 95% BHT Branch History Table) BTB(Branch Target Buffer)
85% J Z 60% J Z 60~70%
if (x[i] < 7) y+=1; if (x[i] < 5) c-=4;
---- ( ) ( ) ( ) ---- ---- ( )
i+1 i+2 i-1 i p+1 p+2
1 0
0 2 n -1 1 1 2 n -1 0 2 n-1
N 2 11 10 01 00
Fatch PC 0 0 I-Cache Instructions Opcode Offset k BHT Index 2 k -Entry BHT, 2bits/Entry Branch? + Target PC Taken / Taken
(1) 2 n (2) 4096 ( 12 ) (3) 2 4096 82 99 1 18
IF ( BTB ) ( PC ) ( ) PC ( )
PC PC Y N PC
PC BTB No Yes BTB No Yes PC PC No Yes PC PC BTB BTB
Fatch PC I - Cache PC Entry Valid PC Entry k BTB Index = Match Valid Target