Chapter5- The Processor: Datapath and Control (Single-cycle implementation) 臺大電機系吳安宇教授 V. 3/27/27 V2. 3/29/27 For 27 DSD Course 臺大電機吳安宇教授 - 計算機結構
Outline 5. Introduction 5.2 Logic Design Conventions 5.3 Building a Datapath 5.4 A Simple Implementation Scheme 臺大電機吳安宇教授 - 計算機結構 2
Introduction Show key issues in creating datapaths and designing controls. Design and implement the MIPS instructions including: () memory-reference instructions: lw, sw (2) arithmetic-logical instructions: add, sub, and, or, slt (3) branch instructions: beq, j Guideline in hardware implementation: () Make the common case fast (2) Simplicity favors regularity 臺大電機吳安宇教授 - 計算機結構 3
Overview of the implementation For every instruction, the first two steps are the same: Fetch: Send the Program Counter (PC) to the memory that contains the code (Instruction Fetch) Read registers: Use fields of the instructions to select the registers to read. Load/Store : read one register Others (e.g., R-type, beq) : read two registers lw $s, 2($s2) add $t, $s, $s2 臺大電機吳安宇教授 - 計算機結構 4
Common actions for instruction types Common actions for three instruction types: (all instructions use ALU after reading registers) () Memory-reference instructions: use ALU to calculate effective address (ex) lw $t offset($s5) -- compute offset + $s5 (2) Arithmetic-logical instructions: use ALU for opcode execution add, sub, and, or (3) Branch instructions: use ALU for comparison bne/slt $s, $s2 ($s-$s2, and check sign of the results) 臺大電機吳安宇教授 - 計算機結構 5
After using ALU After using ALU: ) Memory-reference instructions: need to access the memory containing the data to complete a load operation, or store a word to that memory location. 2) Arithmetic-logical instructions: write the result of the ALU back into a destination register. 3) Branch instructions: need to change the next instruction address based on the comparison (change of PC) 臺大電機吳安宇教授 - 計算機結構 6
Typical Instruction Execution 臺大電機吳安宇教授 - 計算機結構 7
An abstract view of MIPS CPU An abstract view of the implementation of the MIPS subset showing the major functional units and the major connections between them. 4 Add Add Data PC Address Instruction Instruction memory Register # Registers ALU Address Register # Register # Data memory Data 臺大電機吳安宇教授 - 計算機結構 8
Outline 5. Introduction 5.2 Logic Design Conventions (skipped) 5.3 Building a Datapath 5.4 A Simple Implementation Scheme 臺大電機吳安宇教授 - 計算機結構 9
Logic Design Conventions Latch v.s. Flip-flop Output is equal to the stored value inside the element. (don't need to ask for permission to look at the value) Change of state (value) is based on the clock. Latch: whenever the inputs change, and the clock is asserted. Flip-flop: state changes only on a clock edge. 臺大電機吳安宇教授 - 計算機結構
Logic Design Conventions D-latch Two inputs: the data value to be stored (D) the clock signal (C) indicating when to read & store D Two outputs: the value of the internal state (Q) and it's complement 臺大電機吳安宇教授 - 計算機結構
Logic Design Conventions D flip-flop (rising/falling edge) Output changes only on the clock edge (ex) the below is a D flip-flop with a falling-edge trigger 臺大電機吳安宇教授 - 計算機結構 2
Logic Design Conventions The function units in the MIPS implementation consist of two different types of logic elements: () combinational element : The outputs depend only on the current inputs. (2) state element : It contains state if it has some internal storage. Clocking methodology: defines when signals can be read and when they can be written. Control signal: used for multiplexer selection or for directing the operation of a functional unit; contrasts with a data signal, which contains information that us operated on by a functional unit. 臺大電機吳安宇教授 - 計算機結構 3
Logic Design Conventions Our Implementation An Edge-triggered methodology Typical execution: read contents of some state elements, send values through some combinational logic write results to one or more state elements 臺大電機吳安宇教授 - 計算機結構 4
Outline 5. Introduction 5.2 Logic Design Conventions 5.3 Building a Datapath 5.4 A Simple Implementation Scheme 臺大電機吳安宇教授 - 計算機結構 5
Building a Datapath Basic elements for access instructions (Instruction Fetch): (a) Instruction memory unit (b) Program Counter (PC): increase by 4 each time (c) Adder: to perform increase by 4 臺大電機吳安宇教授 - 計算機結構 6
Building a Datapath for fetching instructions A portion of datapath used for fetching instructions and incrementing the program counter 臺大電機吳安宇教授 - 計算機結構 7
Building a Datapath for R-type instructions Function: () read two registers (2) perform an ALU operation on the contents of registers (3) write the result back into the destination register Basic elements for R-type instructions: Read operation: () an input to the register file to specify the index of the registers to be read. (2) an output of the register contents. Write operation: () an input to the register file to specify the index of the registers to be written. (2) an input to supply the data to be written into the specified register. 臺大電機吳安宇教授 - 計算機結構 8
Building a Datapath (R-type) Elements which we need: (a) (b) Register file: a collection of registers in which any register can be read or written by specifying the index of the register in the file. ALU (32 bits): operate on the values read from the registers. 4 臺大電機吳安宇教授 - 計算機結構 9
Datapath Design for R-type Instructions [25:2] [3:] [2:6] [5:] /offset 臺大電機吳安宇教授 - 計算機結構 2
Building a Datapath for lw/sw Instructions Basic elements for load/store instructions: (a) Data memory unit: read/write data (b) Sign-extend unit: sign-extend the 6-bit offset field in the instruction to a 32-bit signed value. (c) Register file (d) ALU (add reg + offset to computer the mem address) -- (c) & (d) are just shown as the previous slide. 臺大電機吳安宇教授 - 計算機結構 2
Datapath for lw instructions [25:2] New Mem Address [2:6] [5:] Data from Reg2 /offset 臺大電機吳安宇教授 - 計算機結構 22
Datapath for sw instructions [25:2] [2:6] New Mem Address [5:] Data from Reg2 /offset 臺大電機吳安宇教授 - 計算機結構 23
Branch Instructions (ex) beq $t, $t2, offset # if ($t==$t2) goto (PC+4+offset) else execute next instruction Functions: () The offset field is shifted left 2 bits so that it s a word offset. (2) Branch is taken: when the condition is true, the branch target address becomes the new PC. (3) Branch isn t taken: the incremented PC (PC+4) replaces the current PC, just as for normal instruction. Operations: () Compute the branch target address. (2) Compare the contents of the two registers. 臺大電機吳安宇教授 - 計算機結構 24
Datapath for beq Instructions beq $t, $t2, offset [25:2] [2:6] [5:] /offset 臺大電機吳安宇教授 - 計算機結構 25
Datapath for both Memory and R-type Instructions (Fig. 5.) 臺大電機吳安宇教授 - 計算機結構 26
Simple Datapath for All three types of Instructions (Fig. 5.) 臺大電機吳安宇教授 - 計算機結構 27
Outline 5. Introduction 5.2 Logic Design Conventions 5.3 Building a Datapath 5.4 A Simple Implementation Scheme 臺大電機吳安宇教授 - 計算機結構 28
Basic datapath with control 臺大電機吳安宇教授 - 計算機結構 29
Design of ALU control unit Depending on the instruction type, the ALU will perform lw/sw: compute the memory address by addition R-type (add, sub, AND,OR, slt): depending on the value of the 6-bit function field Branch (beq): subtraction (R-R2) ALU control signals: ALU control lines Function AND OR add subtract set on less than NOR 臺大電機吳安宇教授 - 計算機結構 3
ALU control for each type of instruction Instruction opcode ALUOp Instruction operation Funct field Desired ALU action ALU control input LW load word xxxxxx add SW store word xxxxxx add Branch equal branch equal xxxxxx subtract R-type add add R-type subtract subtract R-type AND and R-type OR or R-type set on less than set on less than 臺大電機吳安宇教授 - 計算機結構 3
A Simple Implementation Scheme The truth table for the three ALU control bits (called Operation) ALUOp ALUOp ALUOp F5 F4 Funct field F3 F2 F F Operation x 臺大電機吳安宇教授 - 計算機結構 32
Simplify the ALU Control The ALU control bits are generated by ALUOp bits and Function code bits Because ALU control bit3 is always, omit it. When is ALU control bit2 == ALU control bit2 臺大電機吳安宇教授 - 計算機結構 33
Simplify the ALU Control When is ALU control bit == ALU control bit 臺大電機吳安宇教授 - 計算機結構 34
Simplify the ALU Control When is ALU control bit == ALU control bit 臺大電機吳安宇教授 - 計算機結構 35
Simplify the ALU Control ALU control logic (overall) 臺大電機吳安宇教授 - 計算機結構 36
Designing the Main Control Unit The two instruction classes Observations: op field: opcode (bit[3:26], which is called Op[5:]. The two registers to be read are specified by rs & rt (for R-type, beq). Base register (for lw, sw) is rs. 6-bit offset (for lw, sw, beq) is bit[5:] (also immediate values) The destination register is in one of the two places: lw : rt, bit[2:6] R-type : rd, bit[5:] /offset 臺大電機吳安宇教授 - 計算機結構 37
Simple Datapath with the Control Unit 臺大電機吳安宇教授 - 計算機結構 38
Effect of the 7 control signals Signal name RegDst Effect when deasserted() The register destination number for the Write register comes from the rf field(bits2-6). Effect when asserted() The register destination number for the Write register comes from the rd field(bits5-). RegWrite ALUSrc PCSrc MemRead MemWrite MemtoReg None The second ALU operand comes from the second register file output (Read data 2). The PC is replaced by the output of the adder that computes the value of PC + 4. None None The value fed to the register Write data input comes from the ALU. The register on the Write register input is written with the value on the Write data input. The second ALU operand is the sign-extend, lower 6 bits of the instruction. The PC is replaced by the output of the adder that computes the branch target. Data memory contents designated by the address input are put on the Read data output. Data memory contents designated by the address input are replaced by the value on the Write data input. The value fed to the register Write data input comes from the data memory. 臺大電機吳安宇教授 - 計算機結構 39
Control Unit Design The setting of the control lines is completed by the opcode field of the instruction. 臺大電機吳安宇教授 - 計算機結構 4
Operation for R-type instruction The 4 steps of the operation for R-type instruction add $t, $t2, $t3 Fetch instruction and increment PC ( Instr=Memory[PC] ; PC = PC + 4 ) Read registers ( Reg=Reg[rs], Reg2=Reg[rt] ) Run the ALU operation ( Result = Reg ALUop Reg2 ) Store the result into Register File ( Reg[rd] = Result ) 臺大電機吳安宇教授 - 計算機結構 4
Single-cycle MIPS with 4 instructions 臺大電機吳安宇教授 - 計算機結構 42
Operation for load instruction The 5 steps of the operation for load instruction /offset lw $t, offset($t2) Fetch instruction and increment PC ( Instr=Memory[PC] ; PC = PC + 4 ) Read registers ( $t2 = Reg[rs], only one register is read) Address computing ( Result = $t2 + sign-extend(instr[5-]) ) Load data from memory ( Data = Memory[Result] ) Store data into Register File (Reg[rt] = Data) 臺大電機吳安宇教授 - 計算機結構 43
Operation for store instruction The 4 steps of the operation for store instruction /offset sw $t, offset($t2) Fetch instruction and increment PC ( Instr=Memory[PC] ; PC = PC + 4 ) Read two registers ( Reg=Reg[rs], Reg2=Reg[rt] ) Address computing ( Result = Reg + sign-extend(instr[5-]) ) Store data into memory ( Memory[Result] = Reg2 ) 臺大電機吳安宇教授 - 計算機結構 44
Operation for beq instruction The 3 steps of the operation for beq instruction /offset beq $t, $t2, offset Fetch instruction and increment PC ( Instr=Memory[PC] ; PC = PC + 4 ) Read two registers ( Reg=Reg[rs], Reg2=Reg[rt] ) Compute branch target address ( Result = PC + ( sign-extend (Instr[5-] << 2 ) ) ) Run the ALU operation ( Result = Reg minus Reg2 ) Observe zero to branch or not ( If zero==, then PC = Result. Otherwise, PC unchanged ) 臺大電機吳安宇教授 - 計算機結構 45
臺大電機吳安宇教授 - 計算機結構 46 Finalizing the control signals ALUOp ALUOp Branch MemWrite MemRead RegWrite x x MemtoReg ALUSrc x RegDst outputs Op Op Op2 Op3 Op4 Op5 Inputs beq sw lw R-format Signal name Input/output
Datapath for Jump Jump operation: (opcode = ) Replace a portion of the PC(bit 27-) with the lower 26 bits of the instruction shifted left by 2 bits. The shift operation is accomplished by simple concatenating to the jump offest. Fixed 臺大電機吳安宇教授 - 計算機結構 47
Implementing Jumps JUMP 臺大電機吳安宇教授 - 計算機結構 48
Single-cycle implementation Why a single-cycle implementation isn t used today? Long cycle time for each instruction (load takes longest time) All instructions take as much time as the slowest one 臺大電機吳安宇教授 - 計算機結構 49
Performance of single-cycle implementation Example: Assumption: Memory units : 2 ps ALU and adders : ps Register file ( read / write) : 5 ps Multiplexers, control unit, PC accesses, sign extension unit, and wires have no delay. The instruction mix: 25% loads, % stores, 45% ALU instructions, 5% branches, 5% jumps. Problem: which one would be faster and by how much? () fixed clock cycle (2) variable-length clock cycle 臺大電機吳安宇教授 - 計算機結構 5
Answer: Performance of single-cycle implementation The critical path for the different instruction classes: Instruction class Functional units used by the instruction class R-type Instruction fetch Register access ALU Register access Load word Instruction fetch Register access ALU Memory access Register access Store word Instruction fetch Register access ALU Memory access Branch Instruction fetch Register access ALU Jump Instruction fetch Compute the require length for each instruction class: Instruction class Instruction memory Register read ALU operation Data memory Register write Total R-type 2 5 5 4 ps Load word 2 5 2 5 6 ps Store word 2 5 2 55 ps Branch 2 5 35 ps Jump 2 2 ps 臺大電機吳安宇教授 - 計算機結構 5
Performance of single-cycle implementation Calculation equations: CPU execution time = instruction count * CPI * clock cycle time Assume CPI=, CPU execution time = instruction count * clock cycle time Calculate CPU execution time : () fixed clock cycle : 6 ps (2) variable-length clock cycle : 6*25% + 55*% + 4*45% + 35*5% + 2* 5% =447.5 ps -- The one with variable-length clock cycle is faster. Performance ratio: CPU clock cycle (fixed) 6.34 CPU clock cycle (variable) 447.5 臺大電機吳安宇教授 - 計算機結構 52