Microsoft PowerPoint - CA_02 Chapter5 Part-I_Single _V2.ppt

Similar documents
Microsoft PowerPoint - CA_03 Chapter5 Part-II_multi _V1.ppt

Microsoft PowerPoint - STU_EC_Ch08.ppt

Microsoft PowerPoint - CA_04 Chapter6 v ppt

Edge-Triggered Rising Edge-Triggered ( Falling Edge-Triggered ( Unit 11 Latches and Flip-Flops 3 Timing for D Flip-Flop (Falling-Edge Trigger) Unit 11

1 CPU

第五章 重叠、流水和现代处理器技术

单周期数据通路

Computer Architecture

Microsoft Word - template.doc

1.ai

AN INTRODUCTION TO PHYSICAL COMPUTING USING ARDUINO, GRASSHOPPER, AND FIREFLY (CHINESE EDITION ) INTERACTIVE PROTOTYPING

untitled

<4D F736F F D205F FB942A5CEA668B443C5E9BB73A740B5D8A4E5B8C9A552B1D0A7F75FA6BFB1A4ACFC2E646F63>

PowerPoint Presentation

Microsoft PowerPoint - C15_LECTURE_NOTE_05.ppt

BC04 Module_antenna__ doc

穨control.PDF

Microsoft Word - Final Exam Review Packet.docx

Microsoft Word 谢雯雯.doc

高中英文科教師甄試心得


Microsoft PowerPoint - CH 04 Techniques of Circuit Analysis

Improved Preimage Attacks on AES-like Hash Functions: Applications to Whirlpool and Grøstl

Microsoft PowerPoint - STU_EC_Ch07.ppt

入學考試網上報名指南

C/C++语言 - 运算符、表达式和语句

致 谢 本 论 文 能 得 以 完 成, 首 先 要 感 谢 我 的 导 师 胡 曙 中 教 授 正 是 他 的 悉 心 指 导 和 关 怀 下, 我 才 能 够 最 终 选 定 了 研 究 方 向, 确 定 了 论 文 题 目, 并 逐 步 深 化 了 对 研 究 课 题 的 认 识, 从 而 一

Microsoft PowerPoint - STU_EC_Ch01.ppt

K301Q-D VRT中英文说明书141009

Microsoft PowerPoint - C15_LECTURE_NOTE_05.ppt

C/C++ - 字符输入输出和字符确认

Windows XP

Fun Time (1) What happens in memory? 1 i n t i ; 2 s h o r t j ; 3 double k ; 4 char c = a ; 5 i = 3; j = 2; 6 k = i j ; H.-T. Lin (NTU CSIE) Referenc

untitled

Important Notice SUNPLUS TECHNOLOGY CO. reserves the right to change this documentation without prior notice. Information provided by SUNPLUS TECHNOLO

地質調査研究報告/Bulletin of the Geological Survey of Japan

投影片 1

Microsoft PowerPoint - STU_EC_Ch04.ppt

2015年4月11日雅思阅读预测机经(新东方版)

SHIMPO_表1-表4

4. 每 组 学 生 将 写 有 习 语 和 含 义 的 两 组 卡 片 分 别 洗 牌, 将 顺 序 打 乱, 然 后 将 两 组 卡 片 反 面 朝 上 置 于 课 桌 上 5. 学 生 依 次 从 两 组 卡 片 中 各 抽 取 一 张, 展 示 给 小 组 成 员, 并 大 声 朗 读 卡

hks298cover&back

Microsoft PowerPoint - Aqua-Sim.pptx

國 立 政 治 大 學 教 育 學 系 2016 新 生 入 學 手 冊 目 錄 表 11 國 立 政 治 大 學 教 育 學 系 博 士 班 資 格 考 試 抵 免 申 請 表 論 文 題 目 申 報 暨 指 導 教 授 表 12 國 立 政 治 大 學 碩 博 士 班 論

UDC Empirical Researches on Pricing of Corporate Bonds with Macro Factors 厦门大学博硕士论文摘要库

6-1 Table Column Data Type Row Record 1. DBMS 2. DBMS MySQL Microsoft Access SQL Server Oracle 3. ODBC SQL 1. Structured Query Language 2. IBM

L L L-1 L-1 L-1 L-1 L-1 L-2 L-1 L-1 L-2 L-2 L-2 L-2 L-2 L-2 L-2 L-2 L-2 L-2 L-3 L-3 L-3 L-3 L-2 L-2 L-2 L-2 L

WWW PHP

Microsoft PowerPoint - ch6 [相容模式]

Microsoft Word - 論文封面 修.doc

邏輯分析儀的概念與原理-展示版

SuperMap 系列产品介绍

Microsoft PowerPoint - C15_LECTURE_NOTE_04.ppt

Chapter 24 DC Battery Sizing

Microsoft PowerPoint - C15_LECTURE_NOTE_04.ppt

Abstract Today, the structures of domestic bus industry have been changed greatly. Many manufacturers enter into the field because of its lower thresh

r_09hr_practical_guide_kor.pdf

VASP应用运行优化

Chapter 6

1505.indd


Outline Speech Signals Processing Dual-Tone Multifrequency Signal Detection 云南大学滇池学院课程 : 数字信号处理 Applications of Digital Signal Processing 2

untitled

豐佳燕.PDF

南華大學數位論文

CH01.indd

K7VT2_QIG_v3


SHIMPO_表1-表4

...1 What?...2 Why?...3 How? ( ) IEEE / 23


Microsoft PowerPoint - ATF2015.ppt [相容模式]

168 健 等 木醋对几种小浆果扦插繁殖的影响 第1期 the view of the comprehensive rooting quality, spraying wood vinegar can change rooting situation, and the optimal concent

9 什 么 是 竞 争 与 冒 险 现 象? 怎 样 判 断? 如 何 消 除?( 汉 王 笔 试 ) 在 组 合 逻 辑 中, 由 于 门 的 输 入 信 号 通 路 中 经 过 了 不 同 的 延 时, 导 致 到 达 该 门 的 时 间 不 一 致 叫 竞 争 产 生 毛 刺 叫 冒 险 如

untitled

1 目 錄 1. 簡 介 一 般 甄 試 程 序 第 一 階 段 的 準 備 第 二 階 段 的 準 備 每 間 學 校 的 面 試 方 式 各 程 序 我 的 做 法 心 得 及 筆 記 結 論..

% GIS / / Fig. 1 Characteristics of flood disaster variation in suburbs of Shang

Preface This guide is intended to standardize the use of the WeChat brand and ensure the brand's integrity and consistency. The guide applies to all d

南華大學數位論文

Microsoft Word doc

A VALIDATION STUDY OF THE ACHIEVEMENT TEST OF TEACHING CHINESE AS THE SECOND LANGUAGE by Chen Wei A Thesis Submitted to the Graduate School and Colleg

20


Process Data flow Data store External entity 6-10 Context diagram Level 0 diagram Level 1 diagram Level 2 diagram

LH_Series_Rev2014.pdf

I

三維空間之機械手臂虛擬實境模擬

JOURNAL OF EARTHQUAKE ENGINEERING AND ENGINEERING VIBRATION Vol. 31 No. 5 Oct /35 TU3521 P315.

IP TCP/IP PC OS µclinux MPEG4 Blackfin DSP MPEG4 IP UDP Winsock I/O DirectShow Filter DirectShow MPEG4 µclinux TCP/IP IP COM, DirectShow I

<4D F736F F D20B5DAC8FDB7BDBE57C9CFD6A7B8B6D6AEB7A8C2C98696EE7DCCBDBEBF2E646F63>

Transcription:

Chapter5- The Processor: Datapath and Control (Single-cycle implementation) 臺大電機系吳安宇教授 V. 3/27/27 V2. 3/29/27 For 27 DSD Course 臺大電機吳安宇教授 - 計算機結構

Outline 5. Introduction 5.2 Logic Design Conventions 5.3 Building a Datapath 5.4 A Simple Implementation Scheme 臺大電機吳安宇教授 - 計算機結構 2

Introduction Show key issues in creating datapaths and designing controls. Design and implement the MIPS instructions including: () memory-reference instructions: lw, sw (2) arithmetic-logical instructions: add, sub, and, or, slt (3) branch instructions: beq, j Guideline in hardware implementation: () Make the common case fast (2) Simplicity favors regularity 臺大電機吳安宇教授 - 計算機結構 3

Overview of the implementation For every instruction, the first two steps are the same: Fetch: Send the Program Counter (PC) to the memory that contains the code (Instruction Fetch) Read registers: Use fields of the instructions to select the registers to read. Load/Store : read one register Others (e.g., R-type, beq) : read two registers lw $s, 2($s2) add $t, $s, $s2 臺大電機吳安宇教授 - 計算機結構 4

Common actions for instruction types Common actions for three instruction types: (all instructions use ALU after reading registers) () Memory-reference instructions: use ALU to calculate effective address (ex) lw $t offset($s5) -- compute offset + $s5 (2) Arithmetic-logical instructions: use ALU for opcode execution add, sub, and, or (3) Branch instructions: use ALU for comparison bne/slt $s, $s2 ($s-$s2, and check sign of the results) 臺大電機吳安宇教授 - 計算機結構 5

After using ALU After using ALU: ) Memory-reference instructions: need to access the memory containing the data to complete a load operation, or store a word to that memory location. 2) Arithmetic-logical instructions: write the result of the ALU back into a destination register. 3) Branch instructions: need to change the next instruction address based on the comparison (change of PC) 臺大電機吳安宇教授 - 計算機結構 6

Typical Instruction Execution 臺大電機吳安宇教授 - 計算機結構 7

An abstract view of MIPS CPU An abstract view of the implementation of the MIPS subset showing the major functional units and the major connections between them. 4 Add Add Data PC Address Instruction Instruction memory Register # Registers ALU Address Register # Register # Data memory Data 臺大電機吳安宇教授 - 計算機結構 8

Outline 5. Introduction 5.2 Logic Design Conventions (skipped) 5.3 Building a Datapath 5.4 A Simple Implementation Scheme 臺大電機吳安宇教授 - 計算機結構 9

Logic Design Conventions Latch v.s. Flip-flop Output is equal to the stored value inside the element. (don't need to ask for permission to look at the value) Change of state (value) is based on the clock. Latch: whenever the inputs change, and the clock is asserted. Flip-flop: state changes only on a clock edge. 臺大電機吳安宇教授 - 計算機結構

Logic Design Conventions D-latch Two inputs: the data value to be stored (D) the clock signal (C) indicating when to read & store D Two outputs: the value of the internal state (Q) and it's complement 臺大電機吳安宇教授 - 計算機結構

Logic Design Conventions D flip-flop (rising/falling edge) Output changes only on the clock edge (ex) the below is a D flip-flop with a falling-edge trigger 臺大電機吳安宇教授 - 計算機結構 2

Logic Design Conventions The function units in the MIPS implementation consist of two different types of logic elements: () combinational element : The outputs depend only on the current inputs. (2) state element : It contains state if it has some internal storage. Clocking methodology: defines when signals can be read and when they can be written. Control signal: used for multiplexer selection or for directing the operation of a functional unit; contrasts with a data signal, which contains information that us operated on by a functional unit. 臺大電機吳安宇教授 - 計算機結構 3

Logic Design Conventions Our Implementation An Edge-triggered methodology Typical execution: read contents of some state elements, send values through some combinational logic write results to one or more state elements 臺大電機吳安宇教授 - 計算機結構 4

Outline 5. Introduction 5.2 Logic Design Conventions 5.3 Building a Datapath 5.4 A Simple Implementation Scheme 臺大電機吳安宇教授 - 計算機結構 5

Building a Datapath Basic elements for access instructions (Instruction Fetch): (a) Instruction memory unit (b) Program Counter (PC): increase by 4 each time (c) Adder: to perform increase by 4 臺大電機吳安宇教授 - 計算機結構 6

Building a Datapath for fetching instructions A portion of datapath used for fetching instructions and incrementing the program counter 臺大電機吳安宇教授 - 計算機結構 7

Building a Datapath for R-type instructions Function: () read two registers (2) perform an ALU operation on the contents of registers (3) write the result back into the destination register Basic elements for R-type instructions: Read operation: () an input to the register file to specify the index of the registers to be read. (2) an output of the register contents. Write operation: () an input to the register file to specify the index of the registers to be written. (2) an input to supply the data to be written into the specified register. 臺大電機吳安宇教授 - 計算機結構 8

Building a Datapath (R-type) Elements which we need: (a) (b) Register file: a collection of registers in which any register can be read or written by specifying the index of the register in the file. ALU (32 bits): operate on the values read from the registers. 4 臺大電機吳安宇教授 - 計算機結構 9

Datapath Design for R-type Instructions [25:2] [3:] [2:6] [5:] /offset 臺大電機吳安宇教授 - 計算機結構 2

Building a Datapath for lw/sw Instructions Basic elements for load/store instructions: (a) Data memory unit: read/write data (b) Sign-extend unit: sign-extend the 6-bit offset field in the instruction to a 32-bit signed value. (c) Register file (d) ALU (add reg + offset to computer the mem address) -- (c) & (d) are just shown as the previous slide. 臺大電機吳安宇教授 - 計算機結構 2

Datapath for lw instructions [25:2] New Mem Address [2:6] [5:] Data from Reg2 /offset 臺大電機吳安宇教授 - 計算機結構 22

Datapath for sw instructions [25:2] [2:6] New Mem Address [5:] Data from Reg2 /offset 臺大電機吳安宇教授 - 計算機結構 23

Branch Instructions (ex) beq $t, $t2, offset # if ($t==$t2) goto (PC+4+offset) else execute next instruction Functions: () The offset field is shifted left 2 bits so that it s a word offset. (2) Branch is taken: when the condition is true, the branch target address becomes the new PC. (3) Branch isn t taken: the incremented PC (PC+4) replaces the current PC, just as for normal instruction. Operations: () Compute the branch target address. (2) Compare the contents of the two registers. 臺大電機吳安宇教授 - 計算機結構 24

Datapath for beq Instructions beq $t, $t2, offset [25:2] [2:6] [5:] /offset 臺大電機吳安宇教授 - 計算機結構 25

Datapath for both Memory and R-type Instructions (Fig. 5.) 臺大電機吳安宇教授 - 計算機結構 26

Simple Datapath for All three types of Instructions (Fig. 5.) 臺大電機吳安宇教授 - 計算機結構 27

Outline 5. Introduction 5.2 Logic Design Conventions 5.3 Building a Datapath 5.4 A Simple Implementation Scheme 臺大電機吳安宇教授 - 計算機結構 28

Basic datapath with control 臺大電機吳安宇教授 - 計算機結構 29

Design of ALU control unit Depending on the instruction type, the ALU will perform lw/sw: compute the memory address by addition R-type (add, sub, AND,OR, slt): depending on the value of the 6-bit function field Branch (beq): subtraction (R-R2) ALU control signals: ALU control lines Function AND OR add subtract set on less than NOR 臺大電機吳安宇教授 - 計算機結構 3

ALU control for each type of instruction Instruction opcode ALUOp Instruction operation Funct field Desired ALU action ALU control input LW load word xxxxxx add SW store word xxxxxx add Branch equal branch equal xxxxxx subtract R-type add add R-type subtract subtract R-type AND and R-type OR or R-type set on less than set on less than 臺大電機吳安宇教授 - 計算機結構 3

A Simple Implementation Scheme The truth table for the three ALU control bits (called Operation) ALUOp ALUOp ALUOp F5 F4 Funct field F3 F2 F F Operation x 臺大電機吳安宇教授 - 計算機結構 32

Simplify the ALU Control The ALU control bits are generated by ALUOp bits and Function code bits Because ALU control bit3 is always, omit it. When is ALU control bit2 == ALU control bit2 臺大電機吳安宇教授 - 計算機結構 33

Simplify the ALU Control When is ALU control bit == ALU control bit 臺大電機吳安宇教授 - 計算機結構 34

Simplify the ALU Control When is ALU control bit == ALU control bit 臺大電機吳安宇教授 - 計算機結構 35

Simplify the ALU Control ALU control logic (overall) 臺大電機吳安宇教授 - 計算機結構 36

Designing the Main Control Unit The two instruction classes Observations: op field: opcode (bit[3:26], which is called Op[5:]. The two registers to be read are specified by rs & rt (for R-type, beq). Base register (for lw, sw) is rs. 6-bit offset (for lw, sw, beq) is bit[5:] (also immediate values) The destination register is in one of the two places: lw : rt, bit[2:6] R-type : rd, bit[5:] /offset 臺大電機吳安宇教授 - 計算機結構 37

Simple Datapath with the Control Unit 臺大電機吳安宇教授 - 計算機結構 38

Effect of the 7 control signals Signal name RegDst Effect when deasserted() The register destination number for the Write register comes from the rf field(bits2-6). Effect when asserted() The register destination number for the Write register comes from the rd field(bits5-). RegWrite ALUSrc PCSrc MemRead MemWrite MemtoReg None The second ALU operand comes from the second register file output (Read data 2). The PC is replaced by the output of the adder that computes the value of PC + 4. None None The value fed to the register Write data input comes from the ALU. The register on the Write register input is written with the value on the Write data input. The second ALU operand is the sign-extend, lower 6 bits of the instruction. The PC is replaced by the output of the adder that computes the branch target. Data memory contents designated by the address input are put on the Read data output. Data memory contents designated by the address input are replaced by the value on the Write data input. The value fed to the register Write data input comes from the data memory. 臺大電機吳安宇教授 - 計算機結構 39

Control Unit Design The setting of the control lines is completed by the opcode field of the instruction. 臺大電機吳安宇教授 - 計算機結構 4

Operation for R-type instruction The 4 steps of the operation for R-type instruction add $t, $t2, $t3 Fetch instruction and increment PC ( Instr=Memory[PC] ; PC = PC + 4 ) Read registers ( Reg=Reg[rs], Reg2=Reg[rt] ) Run the ALU operation ( Result = Reg ALUop Reg2 ) Store the result into Register File ( Reg[rd] = Result ) 臺大電機吳安宇教授 - 計算機結構 4

Single-cycle MIPS with 4 instructions 臺大電機吳安宇教授 - 計算機結構 42

Operation for load instruction The 5 steps of the operation for load instruction /offset lw $t, offset($t2) Fetch instruction and increment PC ( Instr=Memory[PC] ; PC = PC + 4 ) Read registers ( $t2 = Reg[rs], only one register is read) Address computing ( Result = $t2 + sign-extend(instr[5-]) ) Load data from memory ( Data = Memory[Result] ) Store data into Register File (Reg[rt] = Data) 臺大電機吳安宇教授 - 計算機結構 43

Operation for store instruction The 4 steps of the operation for store instruction /offset sw $t, offset($t2) Fetch instruction and increment PC ( Instr=Memory[PC] ; PC = PC + 4 ) Read two registers ( Reg=Reg[rs], Reg2=Reg[rt] ) Address computing ( Result = Reg + sign-extend(instr[5-]) ) Store data into memory ( Memory[Result] = Reg2 ) 臺大電機吳安宇教授 - 計算機結構 44

Operation for beq instruction The 3 steps of the operation for beq instruction /offset beq $t, $t2, offset Fetch instruction and increment PC ( Instr=Memory[PC] ; PC = PC + 4 ) Read two registers ( Reg=Reg[rs], Reg2=Reg[rt] ) Compute branch target address ( Result = PC + ( sign-extend (Instr[5-] << 2 ) ) ) Run the ALU operation ( Result = Reg minus Reg2 ) Observe zero to branch or not ( If zero==, then PC = Result. Otherwise, PC unchanged ) 臺大電機吳安宇教授 - 計算機結構 45

臺大電機吳安宇教授 - 計算機結構 46 Finalizing the control signals ALUOp ALUOp Branch MemWrite MemRead RegWrite x x MemtoReg ALUSrc x RegDst outputs Op Op Op2 Op3 Op4 Op5 Inputs beq sw lw R-format Signal name Input/output

Datapath for Jump Jump operation: (opcode = ) Replace a portion of the PC(bit 27-) with the lower 26 bits of the instruction shifted left by 2 bits. The shift operation is accomplished by simple concatenating to the jump offest. Fixed 臺大電機吳安宇教授 - 計算機結構 47

Implementing Jumps JUMP 臺大電機吳安宇教授 - 計算機結構 48

Single-cycle implementation Why a single-cycle implementation isn t used today? Long cycle time for each instruction (load takes longest time) All instructions take as much time as the slowest one 臺大電機吳安宇教授 - 計算機結構 49

Performance of single-cycle implementation Example: Assumption: Memory units : 2 ps ALU and adders : ps Register file ( read / write) : 5 ps Multiplexers, control unit, PC accesses, sign extension unit, and wires have no delay. The instruction mix: 25% loads, % stores, 45% ALU instructions, 5% branches, 5% jumps. Problem: which one would be faster and by how much? () fixed clock cycle (2) variable-length clock cycle 臺大電機吳安宇教授 - 計算機結構 5

Answer: Performance of single-cycle implementation The critical path for the different instruction classes: Instruction class Functional units used by the instruction class R-type Instruction fetch Register access ALU Register access Load word Instruction fetch Register access ALU Memory access Register access Store word Instruction fetch Register access ALU Memory access Branch Instruction fetch Register access ALU Jump Instruction fetch Compute the require length for each instruction class: Instruction class Instruction memory Register read ALU operation Data memory Register write Total R-type 2 5 5 4 ps Load word 2 5 2 5 6 ps Store word 2 5 2 55 ps Branch 2 5 35 ps Jump 2 2 ps 臺大電機吳安宇教授 - 計算機結構 5

Performance of single-cycle implementation Calculation equations: CPU execution time = instruction count * CPI * clock cycle time Assume CPI=, CPU execution time = instruction count * clock cycle time Calculate CPU execution time : () fixed clock cycle : 6 ps (2) variable-length clock cycle : 6*25% + 55*% + 4*45% + 35*5% + 2* 5% =447.5 ps -- The one with variable-length clock cycle is faster. Performance ratio: CPU clock cycle (fixed) 6.34 CPU clock cycle (variable) 447.5 臺大電機吳安宇教授 - 計算機結構 52