计算机组织与系统结构

Similar documents
chx10_arch02_ilp.ppt [兼容模式]

Microsoft PowerPoint - chx08_arch02_ilp.ppt

计算机组织与系统结构

计算机组织与系统结构

计算机组织与系统结构

计算机组织与系统结构

第五章 重叠、流水和现代处理器技术

没有幻灯片标题

没有幻灯片标题

Microsoft PowerPoint - CHX05_arch04_tomasulo.ppt

没有幻灯片标题

chx10_arch03_OoOIssue.ppt [兼容模式]

1 CPU

Microsoft PowerPoint - CA_02 Chapter5 Part-I_Single _V2.ppt

Pipelining Advanced

Microsoft PowerPoint - CA_03 Chapter5 Part-II_multi _V1.ppt

Microsoft PowerPoint - CA_04 Chapter6 v ppt

[Group 9] Give an example of structural hazard ans 1. 假設下列指令是在只有單一記憶體的 datapath 中執行 lw $5, 100($2) add $2, $7, $4 add $4, $2, $5 sw $5, 100($2)

Microsoft PowerPoint - notes3-Simple-filled12

2/80 2

untitled

Microsoft PowerPoint - C15_LECTURE_NOTE_05.ppt

Microsoft PowerPoint - chx09_org16_pipelining_3.ppt

入學考試網上報名指南

Microsoft PowerPoint - C15_LECTURE_NOTE_05.ppt

Microsoft PowerPoint - STU_EC_Ch08.ppt

B 6 A A N A S A +V B B B +V 2

没有幻灯片标题

Tel:

Microsoft Word - (web)_F.1_Notes_&_Application_Form(Chi)(non-SPCCPS)_16-17.doc

Computer Architecture

RS-232C [11-13] 1 1 (PLC) (HMI) Visual Basic (PLC) 402

投影片 1

1.ai

Simulator By SunLingxi 2003

3.2 Ö¸Á²¢ÐиÅÄîÓë¼¼Êõ

Training

A Preliminary Implementation of Linux Kernel Virus and Process Hiding

Chapter 6

AN INTRODUCTION TO PHYSICAL COMPUTING USING ARDUINO, GRASSHOPPER, AND FIREFLY (CHINESE EDITION ) INTERACTIVE PROTOTYPING

Chapter 3

逢 甲 大 學

单周期数据通路

Microsoft PowerPoint - C15_LECTURE_NOTE_04.ppt

ch_code_infoaccess

Microsoft Word - CX VMCO 3 easy step v1.doc

Microsoft PowerPoint - chx09_org14_pipelining_1.ppt

¶C¶L§§¬_™¨ A.PDF

邏輯分析儀的概念與原理-展示版

Microsoft Word - SupplyIT manual 3_cn_david.doc

<4D F736F F D20B5DAC8FDCBC4D5C2D7F7D2B5B4F0B0B82E646F63>

Chapter 24 DC Battery Sizing

Microsoft PowerPoint - C15_LECTURE_NOTE_04.ppt

¸ß¼¶¼ÆËã»úÌåϵ½á¹¹

WWW PHP Comments Literals Identifiers Keywords Variables Constants Data Types Operators & Expressions 2

一个开放源码的嵌入式仿真环境 ― SkyEye

<4D F736F F F696E74202D20B5DAD2BBD5C228B4F2D3A1B0E6292E BBCE6C8DDC4A3CABD5D>

Huawei Technologies Co

2006中國文學研究範本檔

OSI OSI 15% 20% OSI OSI ISO International Standard Organization 1984 OSI Open-data System Interface Reference Model OSI OSI OSI OSI ISO Prototype Prot

Microsoft PowerPoint - chap3.ppt

Microsoft Word - HC20138_2010.doc

CC213

PowerPoint Presentation

3.2 指令级并行概念与技术

untitled

國立中山大學論文典藏.PDF

K301Q-D VRT中英文说明书141009

Microsoft PowerPoint - TTCN-Introduction-v5.ppt

Microsoft PowerPoint 輸入輸出裝置(I_O Devices).pptx

ICD ICD ICD ICD ICD

Improved Preimage Attacks on AES-like Hash Functions: Applications to Whirlpool and Grøstl

未命名

untitled

92 (When) (Where) (What) (Productivity) (Efficiency) () (2) (3) (4) (5) (6) (7) em-plant( SiMPLE++) Scheduling When Where Productivity Efficiency [5]

关于规范区委、区委办公室发文

Microsoft Word - ??山

Microsoft Word - 助理人員教育訓練-會計室.docx


SuperMap 系列产品介绍

(Load Project) (Save Project) (OffLine Mode) (Help) Intel Hex Motor

Bus Hound 5

北部推動中心00-08_temp_.PDF

多核心CPU成長日記.doc

bbc_bond_is_back_worksheet.doc

C/C++ - 字符输入输出和字符确认

2005 5,,,,,,,,,,,,,,,,, , , 2174, 7014 %, % 4, 1961, ,30, 30,, 4,1976,627,,,,, 3 (1993,12 ),, 2

WinMDI 28

Go构建日请求千亿微服务最佳实践的副本

L23

Value Chain ~ (E-Business RD / Pre-Sales / Consultant) APS, Advanc

PTS7_Manual.PDF

Microsoft Word - 目錄-ok.docx

提纲 1 2 OS Examples for 3

消防人員對九二一震災消防搶救時序之認知研究*

主標題-37pt 主標若有二行以上, 可使用藍、綠 分二部份

穨control.PDF

C/C++ - 文件IO

hks298cover&back

Microsoft Word - template.doc

Transcription:

高等计算机系统结构 指令级并行处理 ( 第二讲 ) 程旭 2012 年 3 月 5 日

复习 : 三种数据冒险 对于执行如下类型的指令序列 : r k (r i ) op (r j ) 真数据相关 (True Data-dependence) r 3 (r 1 ) op (r 2 ) r 5 (r 3 ) op (r 4 ) Read-after-Write (RAW) hazard 反相关 (Anti-dependence) r 3 (r 1 ) op (r 2 ) r 1 (r 4 ) op (r 5 ) Write-after-Read (WAR) hazard 输出相关 (Output-dependence) r 3 (r 1 ) op (r 2 ) r 3 (r 6 ) op (r 7 ) Write-after-Write (WAW) hazard

数据冒险示例 dest src1 src2 I 1 DIVD f6, f6, f4 I 2 LD f2, 45(r3) I 3 MULTD f0, f2, f4 I 4 DIVD f8, f6, f2 I 5 SUBD f10, f0, f6 I 6 ADDD f6, f8, f2 先写后读冒险 (RAW Hazards) 先读后写冒险 (WAR Hazards) 写写冒险 (WAW Hazards)

复杂指令流水线 ALU Mem IF ID Issue WB Fadd GPR s FPR s Fmul Fdiv 为了追求更高性能, 流水线变得更加复杂, 这是因为 : 流水化浮点部件的长时延 多功能和存储部件 具有可变访问时间的存储系统 精确中断

复杂按序指令流水线 PC Inst. Mem D Decode GPRs X1 + X2 Data Mem X3 W 延迟回写 (Delay writeback) 以确保所有操作到 W 级都具有相同的时延 FPRs X1 X2 Fadd X3 W 写端口不可被复用 ( 每个周期只有一条指令进入 一条指令流出 ) 指令按序提交, 简化了精确中断的实现 X2 Fmul X3 Commit Point 如何避免由于不断增加的回写时延, 而不要导致单周期整数操作变慢? FDiv X2 Unpipelined divider X3 旁路 (Bypassing)

复杂指令流水线 ALU Mem IF ID Issue WB GPR s FPR s Fadd Fmul 如何解决写冒险, 而不需要均分所有流水级, 并不要旁路电路? Fdiv

何时可以安全地发射一条指令? 假设有一个统一的数据结构跟踪记录在所有功能部件中的所有指令状态 在发射级分发 (dispatch) 一条指令之前, 需要完成如下检查 : 所需功能部件是否可用? 输入数据是否可用? RAW? 写目的操作数是否安全? WAR? WAW? 是否在 WB 级会出现结构冒险?

硬件策略 : 指令并行 为什么需要硬件在运行时支持? 在编译时有些相关情况不能真正判定 简化编译处理 针对某一机器产生的代码可以在另一机器上有效运行 核心思路 : 允许暂停之后的指令被处理 DIVD ADDD SUBD F0,F2,F4 F10,F0,F8 F12,F8,F14 允许乱序 (out-of-order) 执行 => 乱序完成 在 1963 年的 CDC 6600 机器中,ID 段检测结构冒险和记分板 (Scoreboard) 数据 核心思路 : 寄存器换名 DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F0,F8,F14 MULD F6,F10,F0 消除 WAR 和 WAW 冒险 DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F100,F8,F14 MULD F6,F10,F100

超标量处理器的内部部件 Branch Unit Load/ Store Unit MMU Instruction Issue Unit Floating- Point Unit(s) Floating- Point Registers I-cache BHT BTAC Instruction Fetch Unit Instruction Decode and Register Rename Unit Integer Unit(s) General Purpose Registers MMU Instruction Buffer Reorder Buffer Retire Unit Rename Registers Bus Interface Unit 32 (64) Data Bus 32 (64) Address Bus Control Bus D-cache

指令窗口 超标量流水线 取指 译码和换名 发射 执行执行执行执行 退离和回写 按序将指令递交到乱序执行的内核!

支持按序发射指令的记分板技术 Scoreboard for In-order Issues Busy[FU#] : a bit-vector to indicate FU s availability. (FU = Int, Add, Mult, Div) These bits are hardwired to FU's. WP[reg#] : a bit-vector to record the registers for which writes are pending. These bits are set to true by the Issue stage and set to false by the WB stage Issue checks the instruction (opcode dest src1 src2) against the scoreboard (Busy & WP) to dispatch FU available? RAW? WAR? WAW? Busy[FU#] WP[src1] or WP[src2] cannot arise WP[dest]

硬件策略 : 指令并行 ( 续一 ) 乱序执行分解 ID 段 : 1. Issue decode instructions, check for structural hazards 2. Read operands wait until no data hazards, then read operands 只要指令同时满足上述两个条件, 记分板就允许该指令执行, 而无需等待前面的指令完成 CDC 6600: 按序发射 乱序执行 乱序提交 (commit) ( 也就是完成 [completion])

CDC 6600 logic gates

C D C 6 6 0 0 结构简图

Registers Functional Units 记分板体系结构 FP Mult FP Mult FP Divide FP Add Integer SCOREBOARD Memory

记分板的含义 乱序完成 => WAR, WAW 冒险? 对 WAR 的解决方案 排队等待操作以及它们操作数的拷贝 只在读操作段才读取寄存器 对 WAW 的解决方案, 必须检测冒险 : 暂停等待到其他指令完成 在执行阶段可能有多个指令 => 设置多个执行部件或者流水化执行部件 记分板跟踪相关 状态或操作 记分板用四个流水段代替 ID EX WB 三段

记分板控制的四级 1. Issue decode instructions & check for structural hazards (ID1) If a functional unit for the instruction is free and no other active instruction has the same destination register (WAW), the scoreboard issues the instruction to the functional unit and updates its internal data structure. If a structural or WAW hazard exists, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared. 2. Read operands wait until no data hazards, then read operands (ID2) A source operand is available if no earlier issued active instruction is going to write it, or if the register containing the operand is being written by a currently active functional unit. When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution. The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order.

记分板控制的四级 ( 续一 ) 3. Execution operate on operands (EX) The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution. 4. Write result finish execution (WB) Once the scoreboard is aware that the functional unit has completed execution, the scoreboard checks for WAR hazards. If none, it writes results. If WAR, then it stalls the instruction. Example: DIVD ADDD SUBD F0,F2,F4 F10,F0,F8 F8,F8,F14 CDC 6600 scoreboard would stall SUBD until ADDD reads operands

记分板的三个主要组成部分 1. Instruction status which of 4 steps the instruction is in 2. Functional unit status Indicates the state of the functional unit (FU). 9 fields for each functional unit Busy Indicates whether the unit is busy or not Op Operation to perform in the unit (e.g., + or ) Fi Destination register Fj, Fk Source-register numbers Qj, Qk Functional units producing source registers Fj, Fk Rj, Rk Flags indicating when Fj, Fk are ready 3. Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

记分板流水线控制的细节 Instruction status Issue Read operands Execution complete Write result Wait until Not busy (FU) and not result(d) Rj and Rk Functional unit done f((fj( f ) Fi(FU) or Rj( f )=No) & (Fk( f ) Fi(FU) or Rk( f )=No)) Bookkeeping Busy(FU) yes; Op(FU) op; Fi(FU) `D ; Fj(FU) `S1 ; Fk(FU) `S2 ; Qj Result( S1 ); Qk Result(`S2 ); Rj not Qj; Rk not Qk; Result( D ) FU; Rj No; Rk No f(if Qj(f)=FU then Rj(f) Yes); f(if Qk(f)=FU then Rj(f) Yes); Result(Fi(FU)) 0; Busy(FU) No

记分板示例 Instruction status Read ExecuWrite Instruction j k Issue operancompl Result LD MUL F0 F6 F2 34+ R2 F4 LD SUBDF8 F2 F6 45+ R3 F2 ADD: 2 cycles Mult: 10 cycles Divd: 40 cycles DIVDF10 F0 F6 ADD F6 F8 F2 Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer No Mult1 No Add No Divide No FU

记分板示例第一个周期 Instruction status Read ExecutWrite Instruction j k Issue operancompleresult LD F6 34+ R2 1 LD F2 45+ R3 MUL F0 F2 F4 SUBDF8 F6 F2 DIVDF10 F0 F6 ADD F6 F8 F2 Integer Yes Load F6 R2 Yes Mult1 No Add No Divide No 1 FU Integer

记分板示例第二个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 LD F2 45+ R3 Issue 2nd LD? MULTF0 F2 F4 SUBDF8 F6 F2 DIVDF10 F0 F6 ADDDF6 F8 F2 Integer Yes Load F6 R2 Yes Mult1 No Add No Divide No 2 FU Integer

记分板示例第三个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 LD F2 45+ R3 MUL F0 F2 F4 SUBDF8 F6 F2 DIVDF10 F0 F6 ADD F6 F8 F2 Integer Yes Load F6 R2 No Mult1 No Add No Divide No 3 FU Integer Issue MULT?

记分板示例第四个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 MUL F0 F2 F4 SUBDF8 F6 F2 DIVDF10 F0 F6 ADD F6 F8 F2 Integer No Mult1 No Add No Divide No 4 FU

记分板示例第五个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 MUL F0 F2 F4 SUBDF8 F6 F2 DIVDF10 F0 F6 ADD F6 F8 F2 Integer Yes Load F2 R3 Yes Mult1 No Add No Divide No 5 FU Integer

记分板示例第六个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 MUL F0 F2 F4 6 SUBDF8 F6 F2 DIVDF10 F0 F6 ADD: 2 cycles Mult: 10 cycles Divd: 40 cycles ADD F6 F8 F2 Integer Yes Load F2 R3 Yes Mult1 Yes Mult F0 F2 F4 Integer No Yes Add No Divide No 6 FU Mult1Integer

记分板示例第七个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 MUL F0 F2 F4 6 SUBDF8 F6 F2 7 DIVDF10 F0 F6 ADD F6 F8 F2 Integer Yes Load F2 R3 No Mult1 Yes Mult F0 F2 F4 Integer No Yes Add Yes Sub F8 F6 F2 IntegerYes No Divide No 7 FU Mult1Integer Add Read multiply operands?

记分板示例第 8a 个周期 ( 前半个周期 ) Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 MUL F0 F2 F4 6 SUBDF8 F6 F2 7 DIVDF10 F0 F6 8 ADD F6 F8 F2 Integer Yes Load F2 R3 No Mult1 Yes Mult F0 F2 F4 Integer No Yes Add Yes Sub F8 F6 F2 IntegerYes No Divide Yes Div F10 F0 F6 Mult1 No Yes 8 FU Mult1Integer Add Divide

记分板示例第 8b 个周期 ( 后半个周期 ) Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 SUBDF8 F6 F2 7 DIVDF10 F0 F6 8 ADD F6 F8 F2 Integer No Mult1 Yes Mult F0 F2 F4 Yes Yes Add Yes Sub F8 F6 F2 Yes Yes Divide Yes Div F10 F0 F6 Mult1 No Yes 8 FU Mult1 Add Divide

记分板示例第九个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 ADD: 2 cycles Mult: 10 cycles SUBDF8 F6 F2 7 9 Divd: 40 cycles DIVDF10 F0 F6 8 ADD F6 F8 F2 Integer No 10 Mult1 Yes Mult F0 F2 F4 Yes Yes 2 Add Yes Sub F8 F6 F2 Yes Yes Divide Yes Div F10 F0 F6 Mult1 No Yes 9 FU Mult1 Add Divide Read operands for MULT & SUBD? Issue ADDD?

记分板示例第十个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult ADD: 2 cycles LD MUL F0 F6 F2 34+ R2 F4 1 6 2 9 3 4 LD SUBDF8 F2 F6 45+ R3 F2 5 7 6 9 7 8 Mult: 10 cycles Divd: 40 cycles DIVDF10 F0 F6 8 ADD F6 F8 F2 Integer No 9 Mult1 Yes Mult F0 F2 F4 No No 1 Add Yes Sub F8 F6 F2 No No Divide Yes Div F10 F0 F6 Mult1 No Yes 10 FU Mult1 Add Divide

记分板示例第十一个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult ADD: 2 cycles LD MUL F0 F6 F2 34+ R2 F4 1 6 2 9 3 4 LD SUBDF8 F2 F6 45+ R3 F2 5 7 6 9 7 11 8 Mult: 10 cycles Divd: 40 cycles DIVDF10 F0 F6 8 ADD F6 F8 F2 Integer No 8 Mult1 Yes Mult F0 F2 F4 No No 0 Add Yes Sub F8 F6 F2 No No Divide Yes Div F10 F0 F6 Mult1 No Yes 11 FU Mult1 Add Divide

记分板示例第十二个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 ADD F6 F8 F2 Integer No 7 Mult1 Yes Mult F0 F2 F4 No No Add No Divide Yes Div F10 F0 F6 Mult1 No Yes 12 FU Mult1 Divide Read operands for DIVD?

记分板示例第十三个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 ADD F6 F8 F2 13 Integer No 6 Mult1 Yes Mult F0 F2 F4 No No Add Yes Add F6 F8 F2 Yes Yes Divide Yes Div F10 F0 F6 Mult1 No Yes 13 FU Mult1 Add Divide

记分板示例第十四个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 ADD F6 F8 F2 13 14 Integer No 5 Mult1 Yes Mult F0 F2 F4 No No 2 Add Yes Add F6 F8 F2 Yes Yes Divide Yes Div F10 F0 F6 Mult1 No Yes 14 FU Mult1 Add Divide

记分板示例第十五个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 ADDDF6 F8 F2 13 14 Integer No 4 Mult1 Yes Mult F0 F2 F4 No No 1 Add Yes Add F6 F8 F2 No No Divide Yes Div F10 F0 F6 Mult1 No Yes 15 FU Mult1 Add Divide

记分板示例第十六个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 ADD F6 F8 F2 13 14 16 Integer No 3 Mult1 Yes Mult F0 F2 F4 No No 0 Add Yes Add F6 F8 F2 No No Divide Yes Div F10 F0 F6 Mult1 No Yes 16 FU Mult1 Add Divide

记分板示例第十七个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 ADD F6 F8 F2 13 14 16 WAR Hazard! Integer No 2 Mult1 Yes Mult F0 F2 F4 No No Add Yes Add F6 F8 F2 No No Divide Yes Div F10 F0 F6 Mult1 No Yes 17 FU Mult1 Add Divide Write result of ADDD?

记分板示例第十八个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 ADD F6 F8 F2 13 14 16 Integer No 1 Mult1 Yes Mult F0 F2 F4 No No Add Yes Add F6 F8 F2 No No Divide Yes Div F10 F0 F6 Mult1 No Yes 18 FU Mult1 Add Divide

记分板示例第十九个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 19 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 ADD F6 F8 F2 13 14 16 Integer No 0 Mult1 Yes Mult F0 F2 F4 No No Add Yes Add F6 F8 F2 No No Divide Yes Div F10 F0 F6 Mult1 No Yes 19 FU Mult1 Add Divide

记分板示例第二十个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 19 20 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 ADD F6 F8 F2 13 14 16 Integer No Mult1 No Add Yes Add F6 F8 F2 No No Divide Yes Div F10 F0 F6 Yes Yes 20 FU Add Divide

记分板示例第二十一个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 19 20 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 21 ADD F6 F8 F2 13 14 16 Integer No Mult1 No Add Yes Add F6 F8 F2 No No Divide Yes Div F10 F0 F6 Yes Yes 21 FU Add Divide

记分板示例第二十二个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 19 20 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 21 ADD: 2 cycles Mult: 10 cycles Divd: 40 cycles ADD F6 F8 F2 13 14 16 22 Integer No Mult1 No Add No 40 Divide Yes Div F10 F0 F6 No No 22 FU Divide

记分板示例第六十一个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 19 20 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 21 61 ADD F6 F8 F2 13 14 16 22 Integer No Mult1 No Add No 0 Divide Yes Div F10 F0 F6 No No 61 FU Divide

记分板示例第六十二个周期 Instruction status Read Execu Write Instruction j k IssueoperancompleResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MUL F0 F2 F4 6 9 19 20 SUBDF8 F6 F2 7 9 11 12 DIVDF10 F0 F6 8 21 61 62 ADD F6 F8 F2 13 14 16 22 Integer No Mult1 No Mult2 No Add No 0 Divide No 62 FU

CDC 6600 的记分板 来自编译的加速比 1.7; 手编代码的加速比 2.5, 但是由于存储速度慢 ( 没有 Cache) 限制了加速比的提高 6600 记分板的局限性 : 没有前递硬件 指令调度局限于基本块内 ( 指令窗口小 ) 功能部件少 ( 结构冒险 ), 特别是 integer/load store 部件 存在结构冒险, 就暂停发射指令 等待到 WAR 冒险解决 防止 WAW 冒险

本讲小结 软件或硬件的指令级并行 (ILP) 循环级并行最容易判定 软件并行性取决于程序, 如果硬件不能支持就出现冒险 软件相关性 / 编译器复杂性决定编译中是否能展开循环 存储器相关是最难判定的 硬件开采 ILP 在编译时有些相关情况不能真正判定 针对某一机器产生的代码可以在另一机器上有效运行 记分板的核心思想 : 允许暂停之后的指令提前处理 ( 译码 => 发射指令 & 读取操作数 ) 允许乱序执行 => 乱序完成 ID 段检测所有的结构冒险