Microsoft PowerPoint - CA_04 Chapter6 v ppt

Similar documents
Microsoft PowerPoint - CA_02 Chapter5 Part-I_Single _V2.ppt

[Group 9] Give an example of structural hazard ans 1. 假設下列指令是在只有單一記憶體的 datapath 中執行 lw $5, 100($2) add $2, $7, $4 add $4, $2, $5 sw $5, 100($2)

untitled

Chapter 6

第五章 重叠、流水和现代处理器技术

Windows XP

Chapter 6

4. 每 组 学 生 将 写 有 习 语 和 含 义 的 两 组 卡 片 分 别 洗 牌, 将 顺 序 打 乱, 然 后 将 两 组 卡 片 反 面 朝 上 置 于 课 桌 上 5. 学 生 依 次 从 两 组 卡 片 中 各 抽 取 一 张, 展 示 给 小 组 成 员, 并 大 声 朗 读 卡

1 CPU

穨control.PDF

Microsoft PowerPoint - STU_EC_Ch08.ppt

高中英文科教師甄試心得

Microsoft PowerPoint - CH 04 Techniques of Circuit Analysis

Edge-Triggered Rising Edge-Triggered ( Falling Edge-Triggered ( Unit 11 Latches and Flip-Flops 3 Timing for D Flip-Flop (Falling-Edge Trigger) Unit 11

9330.doc

Microsoft Word - template.doc

ch_code_infoaccess

Microsoft Word - 第四組心得.doc

<4D F736F F D C4EAC0EDB9A4C0E04142BCB6D4C4B6C1C5D0B6CFC0FDCCE2BEABD1A15F325F2E646F63>

國 立 政 治 大 學 教 育 學 系 2016 新 生 入 學 手 冊 目 錄 表 11 國 立 政 治 大 學 教 育 學 系 博 士 班 資 格 考 試 抵 免 申 請 表 論 文 題 目 申 報 暨 指 導 教 授 表 12 國 立 政 治 大 學 碩 博 士 班 論

1.ai

Outline Speech Signals Processing Dual-Tone Multifrequency Signal Detection 云南大学滇池学院课程 : 数字信号处理 Applications of Digital Signal Processing 2

1505.indd

Logitech Wireless Combo MK45 English

2005 5,,,,,,,,,,,,,,,,, , , 2174, 7014 %, % 4, 1961, ,30, 30,, 4,1976,627,,,,, 3 (1993,12 ),, 2

PowerPoint Presentation

OA-253_H1~H4_OL.ai

2/80 2

untitled

TX-NR3030_BAS_Cs_ indd

K301Q-D VRT中英文说明书141009

入學考試網上報名指南

VASP应用运行优化

Male Circumcision - Traditional Chinese

Lorem ipsum dolor sit amet, consectetuer adipiscing elit

Microsoft Word - 11月電子報1130.doc

中国人民大学商学院本科学年论文

untitled

Preface This guide is intended to standardize the use of the WeChat brand and ensure the brand's integrity and consistency. The guide applies to all d

Microsoft PowerPoint - notes3-Simple-filled12

從篤加有二「區」談當代平埔文化復振現相

Male Circumcision - Simplified Chinese

Microsoft PowerPoint - Aqua-Sim.pptx

<4D F736F F D205F FB942A5CEA668B443C5E9BB73A740B5D8A4E5B8C9A552B1D0A7F75FA6BFB1A4ACFC2E646F63>

Microsoft Word - ChineseSATII .doc

Lorem ipsum dolor sit amet, consectetuer adipiscing elit


ENGG1410-F Tutorial 6

2. 熟 读 题 目 3. 积 累 核 心 句 式 4. 列 出 每 道 题 的 提 纲 5. 构 造 各 部 分 的 论 证 模 板 6. 全 文 练 习 10 到 20 篇 文 章 如 何 分 析 Argument 题 目 1. Argument 题 目 的 文 字 结 构 1) 题 目 的 出

hks298cover&back

考試學刊第10期-內文.indd

國立中山大學學位論文典藏

Microsoft Word - SupplyIT manual 3_cn_david.doc

单周期数据通路

投影片 1

WTO

Thesis for the Master degree in Engineering Research on Negative Pressure Wave Simulation and Signal Processing of Fluid-Conveying Pipeline Leak Candi

國 史 館 館 刊 第 23 期 Chiang Ching-kuo s Educational Innovation in Southern Jiangxi and Its Effects ( ) Abstract Wen-yuan Chu * Chiang Ching-kuo wa

摘 要 互 联 网 的 勃 兴 为 草 根 阶 层 书 写 自 我 和 他 人 提 供 了 契 机, 通 过 网 络 自 由 开 放 的 平 台, 网 络 红 人 风 靡 于 虚 拟 世 界 近 年 来, 或 无 心 插 柳, 或 有 意 噱 头, 或 自 我 表 达, 或 幕 后 操 纵, 网 络

國家圖書館典藏電子全文

<4D F736F F D2035B171AB73B6CBA8ECAB73A6D3A4A3B6CBA158B3AFA46CA9F9BB50B169A445C4D6AABAB750B94AB8D6B9EFA4F1ACE3A873>

PowerPoint Presentation

國立桃園高中96學年度新生始業輔導新生手冊目錄

2015 Chinese FL Written examination


ap15_chinese_interpersoanal_writing_ _response

BC04 Module_antenna__ doc

PowerPoint Presentation

Microsoft Word doc

Microsoft PowerPoint _代工實例-1

99 學年度班群總介紹 第 370 期 班群總導 陳怡靜 G45 班群總導 陳怡靜(河馬) A 家 惠如 家浩 T 格 宜蓁 小 霖 怡 家 M 璇 均 蓁 雴 家 數學領域 珈玲 國燈 英領域 Kent

untitled

Microsoft PowerPoint - ATF2015.ppt [相容模式]

<4D F736F F F696E74202D20B5DAD2BBD5C228B4F2D3A1B0E6292E BBCE6C8DDC4A3CABD5D>



可 愛 的 動 物 小 五 雷 雅 理 第 一 次 小 六 甲 黃 駿 朗 今 年 暑 假 發 生 了 一 件 令 人 非 常 難 忘 的 事 情, 我 第 一 次 參 加 宿 營, 離 開 父 母, 自 己 照 顧 自 己, 出 發 前, 我 的 心 情 十 分 緊 張 當 到 達 目 的 地 後

Microsoft Word - TIP006SCH Uni-edit Writing Tip - Presentperfecttenseandpasttenseinyourintroduction readytopublish

Microsoft Word - CX VMCO 3 easy step v1.doc

<4D F736F F F696E74202D20C8EDBCFEBCDCB9B9CAA6D1D0D0DEBDB2D7F92E707074>

< F5FB77CB6BCBD672028B0B6A46AABE4B751A874A643295F5FB8D5C5AA28A668ADB6292E706466>

: : : : : ISBN / C53:H : 19.50

03施琅「棄留臺灣議」探索.doc

SHIMPO_表1-表4

<4D F736F F D203033BDD7A16DA576B04FA145A4ADABD2A5BBACF6A16EADBAB6C0ABD2A4A7B74EB8712E646F63>

<4D F736F F D20B5DAC8FDB7BDBE57C9CFD6A7B8B6D6AEB7A8C2C98696EE7DCCBDBEBF2E646F63>

SHIMPO_表1-表4


AN INTRODUCTION TO PHYSICAL COMPUTING USING ARDUINO, GRASSHOPPER, AND FIREFLY (CHINESE EDITION ) INTERACTIVE PROTOTYPING

Improved Preimage Attacks on AES-like Hash Functions: Applications to Whirlpool and Grøstl

Microsoft Word - Final Exam Review Packet.docx

IP TCP/IP PC OS µclinux MPEG4 Blackfin DSP MPEG4 IP UDP Winsock I/O DirectShow Filter DirectShow MPEG4 µclinux TCP/IP IP COM, DirectShow I

168 健 等 木醋对几种小浆果扦插繁殖的影响 第1期 the view of the comprehensive rooting quality, spraying wood vinegar can change rooting situation, and the optimal concent


國立中山大學學位論文典藏.pdf

A Community Guide to Environmental Health

國家圖書館典藏電子全文

從詩歌的鑒賞談生命價值的建構

幻灯片 1

Transcription:

Chap. 6 Enhancing Performance with Pipelining 臺大電機系吳安宇教授 V1. 2007/04/20 臺大電機吳安宇教授 - 計算機結構 1

Outline 6.1 An Overview of Pipelining 6.2 A Pipelined Datapath 6.3 Pipelined Control 6.4 Data Hazards and Forwarding 6.5 Data Hazards and Stalls 6.6 Branch Hazards 6.8 Exceptions (not covered) 6.9 Superscalar and dynamic pipelining (not covered) 臺大電機吳安宇教授 - 計算機結構 2

Pipelining is Natural! Ann, Brian, Cathy, and Don each have dirty clothes to be washed, dried, folded, and put away The washer, dryer, folder, storer each take 30 minutes for their task 臺大電機吳安宇教授 - 計算機結構 3

Sequential laundry If they learned pipelining, how long would it take? Sequential laundry takes 8 hours for 4 loads 臺大電機吳安宇教授 - 計算機結構 4

Pipelined laundry Pipelined laundry takes 3.5 hours for 4 loads 臺大電機吳安宇教授 - 計算機結構 5

Single-, Multi-cycle, vs. Pipeline 臺大電機吳安宇教授 - 計算機結構 6

Why Pipeline? Because the Resources Are There! 臺大電機吳安宇教授 - 計算機結構 7

Pipelining MIPS Execution 臺大電機吳安宇教授 - 計算機結構 8

Pipeline Hazards Structural hazard An occurrence in which a planned instruction cannot execute in the proper clock cycle because the hardware cannot support the combination of instructions that are set to execute in the given clock cycle. Data hazard Also called pipeline data hazard. An occurrence in which a planned instruction cannot execute in the proper clock cycle because data that is needed to execute the instruction is not yet available. Control hazard Also called branch hazard. An occurrence in which the proper instruction cannot execute in the proper clock cycle because the instruction that was fetched is NOT the one that is needed; that is, the flow of instruction addresses is not what the pipeline expected. 臺大電機吳安宇教授 - 計算機結構 9

Data hazard Load-use data hazard A specific form of data hazard in which the data requested by a load instruction has not yet become available when it is requested. Pipeline stall Also called bubble. A stall initiated in order to resolve a hazard Solution: Forwarding Also called bypassing. A method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer-visible register or memory. 臺大電機吳安宇教授 - 計算機結構 10

Control hazard Untaken branch One that falls through to the successive instruction. A taken branch is one that causes transfer to the branch target Solutions: Branch prediction A method of resolving a branch hazard that assumes a given outcome for the branch, and proceeds from that assumption rather than waiting to ascertain the actual outcome 臺大電機吳安宇教授 - 計算機結構 11

Performance Index of Pipelining Latency (pipeline) The number of stages in a pipeline or the number of stages between two instructions during execution. Throughput (pipeline) The number of instructions executed per unit time. 臺大電機吳安宇教授 - 計算機結構 12

Outline 6.1 An Overview of Pipelining 6.2 A Pipelined Datapath 6.3 Pipelined Control 6.4 Data Hazards and Forwarding 6.5 Data Hazards and Stalls 6.6 Branch Hazards 6.8 Exceptions 6.9 Superscalar and dynamic pipelining 臺大電機吳安宇教授 - 計算機結構 13

Designing a Pipelined Processor Examine the datapath and control diagram Starting with single-or multi-cycle datapath? Single-or multi-cycle control? Partition datapath into stages: IF (instruction fetch), ID (instruction decode and register file read), EX (execution or address calculation), MEM (data memory access), WB (write back) Associate resources with states Ensure that flows do not conflict, or figure out how to resolve Assert control in appropriate stage 臺大電機吳安宇教授 - 計算機結構 14

Use Multi-cycle Execution Steps But, use single-cycle datapath.. (separate memory, why??) 臺大電機吳安宇教授 - 計算機結構 15

Split Single-cycle Datapath What to add to split the datapath into stages 臺大電機吳安宇教授 - 計算機結構 16

Add Pipeline Registers Use registers between stages to carry data and control 臺大電機吳安宇教授 - 計算機結構 17

Consider load IF: Instruction Fetch Fetch the instruction from the Instruction Memory ID: Instruction Decode Registers fetch and instruction decode EX: Calculate the memory address MEM: Read the data from the Data Memory WB: Write the data back to the register file 臺大電機吳安宇教授 - 計算機結構 18

Pipelining load 5 functional units in the pipeline datapath are: Instruction Memory for the Ifetch stage Register File s Read ports (busa and busb) for the Reg/Dec stage ALU for the Exec stage Data Memory for the MEM stage Register File s Write port (busw) for the WB stage 臺大電機吳安宇教授 - 計算機結構 19

IF Stage of load word IF/ID= mem[pc] ; PC = PC + 4 臺大電機吳安宇教授 - 計算機結構 20

ID Stage of load word ID/EX(A)= Reg[IR[25-21]]; ID/EX(B)= Reg[IR[20-16]]; ID/EX = Sign-extension of ID[15:0] 臺大電機吳安宇教授 - 計算機結構 21

EX Stage of load word EX/MEM = A + sign-ext(ir[15-0]) % address computation 臺大電機吳安宇教授 - 計算機結構 22

MEM Stage of load word MEM/WB = mem[aluout] 臺大電機吳安宇教授 - 計算機結構 23

WB Stage of load Reg[IR[20-16]] = MEM/WB 臺大電機吳安宇教授 - 計算機結構 24

Pipelined Datapath 臺大電機吳安宇教授 - 計算機結構 25

The Four Stages of R-type IF: fetch the instruction from the Instruction Memory ID: registers fetch and instruction decode EX: ALU operates on the two register operands Update PC WB: write ALU output back to the register file 臺大電機吳安宇教授 - 計算機結構 26

Pipelining R-type and load We have a structural hazard: Two instructions try to write to the register file at the same time! Only one write port 臺大電機吳安宇教授 - 計算機結構 27

Important Observation Each functional unit can only be used once per instruction Each functional unit must be used at the same stage for all instructions: Load uses Register File s write port during its 5th stage R-type uses Register File s write port during its 4th stage Several ways to solve: 1) forwarding, 2) adding pipeline bubble, 3) making instructions same length 臺大電機吳安宇教授 - 計算機結構 28

Solution 1: Insert Bubble Insert a bubble into the pipeline to prevent two writes at the same cycle The control logic can be complex Lose instruction fetch and issue opportunity No instruction is started in Cycle 6! 臺大電機吳安宇教授 - 計算機結構 29

Solution 2: Delay R-type s Write Delay R-type s register write by one cycle: R-type also use Reg File s write port at Stage 5 MEM is a NOP stage: nothing is being done. 臺大電機吳安宇教授 - 計算機結構 30

The Four Stages of store IF: fetch the instruction from the Instruction Memory ID: registers fetch and instruction decode EX: calculate the memory address MEM: write the data into the Data Memory Add an extra stage: WB: NOP 臺大電機吳安宇教授 - 計算機結構 31

The Four Stages of beq IF: fetch the instruction from the Instruction Memory ID: registers fetch and instruction decode EX: compares the two register operand select correct branch target address latch into PC Add two extra stages: MEM: NOP WB: NOP 臺大電機吳安宇教授 - 計算機結構 32

Multiple-clock-cycle pipeline diagram Can help with answering questions like: How many cycles to execute this code? What is the ALU doing during cycle 4? Help understand datapaths 臺大電機吳安宇教授 - 計算機結構 33

Multiple-clock-cycle pipeline Timing diagram 臺大電機吳安宇教授 - 計算機結構 34

Traditional multi-clock-cycle pipeline Timing diagram IF 臺大電機吳安宇教授 - 計算機結構 35

Single-clock-cycle Timing Diagram A vertical slice through a multiple-clock-cycle diagram 臺大電機吳安宇教授 - 計算機結構 36

Example 1: Cycle 1 臺大電機吳安宇教授 - 計算機結構 37

Example 1: Cycle 2 臺大電機吳安宇教授 - 計算機結構 38

Example 1: Cycle 3 臺大電機吳安宇教授 - 計算機結構 39

Example 1: Cycle 4 臺大電機吳安宇教授 - 計算機結構 40

Example 1: Cycle 5 臺大電機吳安宇教授 - 計算機結構 41

Example 1: Cycle 6 臺大電機吳安宇教授 - 計算機結構 42

Outline 6.1 An Overview of Pipelining 6.2 A Pipelined Datapath 6.3 Pipelined Control 6.4 Data Hazards and Forwarding 6.5 Data Hazards and Stalls 6.6 Branch Hazards 6.8 Exceptions 6.9 Superscalar and dynamic pipelining 臺大電機吳安宇教授 - 計算機結構 43

Pipeline Control: Control Signals 臺大電機吳安宇教授 - 計算機結構 44

Group Signals According to Stages Can use control signals of single-cycle CPU 臺大電機吳安宇教授 - 計算機結構 45

Control Signal Details 臺大電機吳安宇教授 - 計算機結構 46

Control Signal Details 臺大電機吳安宇教授 - 計算機結構 47

Data Stationary Control Pass control signals along just like the data Main control generates control signals during ID 臺大電機吳安宇教授 - 計算機結構 48

Data Stationary Control (cont.) Signals for EX (ExtOp, ALUSrc,...) are used 1 cycle later Signals for MEM (MemWr, Branch) are used 2 cycles later Signals for WB (MemtoReg, MemWr) are used 3 cycles later 臺大電機吳安宇教授 - 計算機結構 49

Datapath with Control 臺大電機吳安宇教授 - 計算機結構 50

Let s Try it Out Sample Assembly Program lw $10, 20($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9 臺大電機吳安宇教授 - 計算機結構 51

Example 2: Cycle 1 臺大電機吳安宇教授 - 計算機結構 52

Example 2: Cycle 2 臺大電機吳安宇教授 - 計算機結構 53

Example 2: Cycle 3 臺大電機吳安宇教授 - 計算機結構 54

Example 2: Cycle 4 臺大電機吳安宇教授 - 計算機結構 55

Example 2: Cycle 5 臺大電機吳安宇教授 - 計算機結構 56

Example 2: Cycle 6 臺大電機吳安宇教授 - 計算機結構 57

Example 2: Cycle 7 臺大電機吳安宇教授 - 計算機結構 58

Example 2: Cycle 8 臺大電機吳安宇教授 - 計算機結構 59

Example 2: Cycle 9 臺大電機吳安宇教授 - 計算機結構 60

Summary of Pipeline Basics Pipelining is a fundamental concept Multiple steps using distinct resources Utilize capabilities of datapath by pipelined instruction processing Start next instruction while working on the current one Limited by length of longest stage (plus fill/flush) Need to detect and resolve hazards What makes it easy in MIPS? All instructions are of the same length Just a few instruction formats Memory operands only in loads and stores What makes pipelining hard? hazards 臺大電機吳安宇教授 - 計算機結構 61

Outline 6.1 An Overview of Pipelining 6.2 A Pipelined Datapath 6.3 Pipelined Control 6.4 Data Hazards and Forwarding 6.5 Data Hazards and Stalls 6.6 Branch Hazards 6.8 Exceptions 6.9 Superscalar and dynamic pipelining 臺大電機吳安宇教授 - 計算機結構 62

Data Hazards Order of operand accesses changed by pipeline Starting next instruction before first is finished Dependencies go backward in time 臺大電機吳安宇教授 - 計算機結構 63

Handling Data Hazards Detect Resolve remaining ones Compiler inserts NOP Stall Forward 臺大電機吳安宇教授 - 計算機結構 64

Software Solution Have compiler guarantee no hazards Where do we insert the NOPs? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Problem: not efficient enough! 臺大電機吳安宇教授 - 計算機結構 65

Detecting Data Hazards Hazard conditions: EX/MEM.RegisterRd = ID/EX.RegisterRs (1a) EX/MEM.RegisterRd = ID/EX.RegisterRt (1b) MEM/WB.RegisterRd = ID/EX.RegisterRs (2a) MEM/WB.RegisterRd = ID/EX.RegisterRt (2b) Two optimizations: Don t forward if instruction does not write register check if RegWrite is asserted Don t forward if destination register is $0 check if RegisterRd = 0 臺大電機吳安宇教授 - 計算機結構 66

Detecting Data Hazards (cont.) Hazard conditions using control signals: At EX stage (EX hazard): If ( EX/MEM.RegWrite and (EX/MEM.RegRd 0) and (EX/MEM.RegRd=ID/EX.RegRs ) ForwardA = 10 臺大電機吳安宇教授 - 計算機結構 67

Detecting Data Hazards (cont.) Hazard conditions using control signals: At MEM stage: MEM/WB.RegWrite and (MEM/WB.RegRd 0) and (MEM/WB.RegRd=ID/EX.RegRs) (replace ID/EX.RegRt for ID/EX.RegRs for the other two conditions) 臺大電機吳安宇教授 - 計算機結構 68

Resolving Hazards: Forwarding Use temporary results, e.g., those in pipeline registers, don t wait for them to be written 臺大電機吳安宇教授 - 計算機結構 69

Forwarding Logic Forwarding: input to ALU from any pipe registers Add multiplexors to ALU input Control forwarding in EX carry Rs in ID/EX Control signals for forwarding: If both WB and MEM forward, e.g., add $1,$1,$2; add $1,$1,$3; add $1,$1,$4; => let MEM forward EX hazard: if (EX/MEM.RegWrite and (EX/MEM.RegRd 0) and (EX/MEM.RegRd=ID/EX.RegRs)) ForwardA=10 MEM hazard: if (MEM/WB.RegWriteand (MEM/WB.RegRd 0) and (EX/MEM.RegRd ID/EX.Reg.Rs) and (MEM/WB.RegRd=ID/EX.RegRs)) ForwardA=01 (ID/EX.RegRt <-> ID/EX.RegRs, ForwardB <-> ForwardA) 臺大電機吳安宇教授 - 計算機結構 70

No Forwarding 臺大電機吳安宇教授 - 計算機結構 71

With Forwarding 臺大電機吳安宇教授 - 計算機結構 72

Pipeline with Forwarding 臺大電機吳安宇教授 - 計算機結構 73

Example 3: Cycle 3 臺大電機吳安宇教授 - 計算機結構 74

Example 3: Cycle 4 臺大電機吳安宇教授 - 計算機結構 75

Example 3: Cycle 5 臺大電機吳安宇教授 - 計算機結構 76

Example 3: Cycle 6 臺大電機吳安宇教授 - 計算機結構 77

Can't Always Forward lw can still cause a hazard: if is followed by an instruction to read the loaded reg. 臺大電機吳安宇教授 - 計算機結構 78

Stalling Stall pipeline by keeping instructions in same stage and inserting an NOP instead 臺大電機吳安宇教授 - 計算機結構 79

Handling Stalls Hazard detection unit in ID to insert stall between a load instruction and its use: if (ID/EX.MemRead and ((ID/EX.RegisterRt= IF/ID.RegisterRs) or (ID/EX.RegisterRt= IF/ID.registerRt)) stall the pipeline for one cycle (ID/EX.MemRead=1 indicates a load instruction) How to stall? Stall instruction in IF and ID: not change PC and IF/ID => the stages re-execute the instructions What to move into EX: insert an NOP by changing EX, MEM, WB control fields of ID/EX pipeline register to 0 as control signals propagate, all control signals to EX, MEM, WB are deasserted and no registers or memories are written 臺大電機吳安宇教授 - 計算機結構 80

Pipeline with Stalling Unit Forwarding controls ALU inputs, hazard detection controls PC, IF/ID, control signals 臺大電機吳安宇教授 - 計算機結構 81

Example 4: Cycle 2 臺大電機吳安宇教授 - 計算機結構 82

Example 4: Cycle 3 臺大電機吳安宇教授 - 計算機結構 83

Example 4: Cycle 4 臺大電機吳安宇教授 - 計算機結構 84

Example 4: Cycle 5 臺大電機吳安宇教授 - 計算機結構 85

Example 4: Cycle 6 臺大電機吳安宇教授 - 計算機結構 86

Example 4: Cycle 7 臺大電機吳安宇教授 - 計算機結構 87

Outline 6.1 An Overview of Pipelining 6.2 A Pipelined Datapath 6.3 Pipelined Control 6.4 Data Hazards and Forwarding 6.5 Data Hazards and Stalls 6.6 Branch Hazards 6.8 Exceptions (optional) 6.9 Superscalar and dynamic pipelining (optional) 臺大電機吳安宇教授 - 計算機結構 88

Branch Hazards When decide to branch, other inst. are in pipeline! 臺大電機吳安宇教授 - 計算機結構 89

Handling Branch Hazard Predict branch always not taken Need to add hardware for flushing inst. if wrong Branch decision made at MEM => need to flush inst. in IF, ID, EX by changing control values to 0 Reduce delay of taken branch by moving branch execution earlier in the pipeline Move up branch address calculation to ID Check branch equality at ID (using XOR) by comparing the two registers read during ID Branch decision made at EX => one inst. to flush Add a control signal, IF.Flush, to zero instruction field of IF/ID => making the instruction an NOP Dynamic branch prediction Compiler rescheduling, delay branch 臺大電機吳安宇教授 - 計算機結構 90

Delayed Branch Predict-not-taken + branch decision at ID => the following inst. is always executed => branches take effect 1 cycle later 0 clock cycle per branch instruction if can find instruction to put in slot ( 50% of time) 臺大電機吳安宇教授 - 計算機結構 91

Pipeline with Flushing 臺大電機吳安宇教授 - 計算機結構 92

Example 5: Cycle 3 臺大電機吳安宇教授 - 計算機結構 93

Example 5: Cycle 4 臺大電機吳安宇教授 - 計算機結構 94

Summary Pipelines pass control information down the pipe just as data moves down pipe Forwarding/stalls handled by local control Exceptions stop the pipeline MIPS instruction set architecture made pipeline visible (delayed branch, delayed load) More performance from deeper pipelines, parallelism 臺大電機吳安宇教授 - 計算機結構 95