JAIST Reposi Title ページアドレス予測による TBL プリローディングの研究 Author(s) 請園, 智玲 Citation Issue Date Type Thesis or Dissertation Text version

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "JAIST Reposi Title ページアドレス予測による TBL プリローディングの研究 Author(s) 請園, 智玲 Citation Issue Date Type Thesis or Dissertation Text version"

Transcription

1 JAIST Reposi Title ページアドレス予測による TBL プリローディングの研究 Author(s) 請園, 智玲 Citation Issue Date Type Thesis or Dissertation Text version author URL Rights Description Supervisor: 田中清史, 情報科学研究科, 修士 Japan Advanced Institute of Science and

2 TLB

3 TLB : Copyright c 2003 by Ukezono Tomoaki 2

4 TLB TLB TLB TLB

5 Wide Range Support WRS WRS Multiple Operand Support MOS WRS MOS [Integer Unit] MMU MMU TLB i

6 ii

7 PPTE PTE Wide Range Support WRS WRS WRS WRS Multiple Operands Support MOS MOS MOS WRS MOS MMU MPSR TLB TLB iii

8 MIPS R MIPS R3000 Attribute iv

9 OS OS OS TLB Translation Lookaside Buffer TLB TLB 1

10 TLB TLB TLB TLB 12.5% 20% [1] TLB TLB TLB [2] subblock TLB[3] TLB TLB TLB TLB CPU

11 TLB ( ) SRAM PC(Program Counter) 3

12 2 32 4KB 1 1/ TLB LRU( ) [1] ( ) 1 4

13 2 2.1 ªªª u ªªª preload preload ªªª +1 Last Hit PTE -1 ªªªªªªª TLBªªªªªªªª TLB 2.1: ( 3 TLB ) TLB ±1 PTE TLB 2 3 5

14 PTE TLB : Predictor Control Unit Prediction Address Generator Buffer Refill Selector 3 6

15 Refill Canceler Status Register 2.2 Predictor Control Unit Status Register Prediction Address Generator Predictor Control Unit VPN(Virtual Page Number) Last Hit PTE RRP VPN( Reamer Reference Position Virtual Page Number ) ±1 VPN VPN AS( Address Strobe ) TLB Buffer Refill Selector Predictor Control Unit Predicted VPN Predicted PPN( Physical Page Number ) TLB Hit VPN TLB Hit PPN (PTE) : 7

16 2.2 T 0 T 6 T 0 (PTE : Page Table Entry) T 1 PTE RRP RRP Data RRP Write Enable TLB T 2 TLB TLB Last Hit PTE TLB TLB PTE TLB (T 1 T 2) TLB TLB TLB PTE T 2 TLB TLB TLB Hit VPN TLB Hit PPN ACK PTE T 3 TLB Hit VPN TLB Hit PPN VPN VPN+1 VPN 1 T 3 VPN+1 VPN Out VPN TLB TLB PTE T 4 VPN+1 PTE Predicted VPN Predicted PPN PTE T 5 T 4 T 3 VPN 1 T 6 2 VPN PTE TLB (T 1 T 2 T 3 T 4 T 5 T 6) 3 12 CPU PTE 8

17 VPN±1 TLB Last Hit PTE 2.2 PTE PTE TLB 3 1 PTE PTE LRU TLB PTE 2 TLB PTE PTE PTE PTE PTE PTE PPTE Pointer Page Table Entry PTE PPTE PPTE TLB PTE PPTE PPTE PTE TLB PPTE 9

18 ªªª PPTE PPTE(Pointer Page Table Entry) Ç Ç Ç Ç Data set A PPTE Data set B PPTE Data set C PPTE PTE ªªª ªªªªª «««T L B PPTE PPTE PPTE 2.4: PPTE PTE 2.4 A B C TLB PPTE 3 TLB PTE PPTE TLB TLB 3 PTE TLB 3 1 TLB 1 TLB 32 TLB TLB 35 TLB 1 TLB PPTE 3 10

19 PPTE 3 TLB TLB Split TLB 4 TLB Shard TLB 5 TLB TLB TLB TLB TLB TLB TLB 5 TLB TLB MIPS TLB TLB TLB 2 TLB 11

20 2.5: Last Hit PTE ±1 PTE 3 1 PTE VPN PPN, Attribute TLB Valid RRP 6 RRP RRP PTE MIPS Dirty Valid Attribute Dirty Attribute 12

21 d +1 ªªª -1 ªªª VPN=97 VPN=97 VPN=97 RRP VPN VPN=98 VPN=99 VPN=100 VPN=98 VPN=99 VPN=100 VPN=98 VPN=99 VPN=100 VPN=101 VPN=101 VPN=101 VPN=102 VPN=102 VPN=102 VPN=103 VPN=103 VPN=103 ªªª RPP BIT +1 ªªª RPP BIT -1 ªªª RPP BIT 100 VPN= VPN=102 Refill 001 VPN=99 RRP VPN 001 VPN= VPN= VPN= VPN= VPN=101 Refill 100 VPN=98 2.6: VPN VPN VPN=100 VPN VPN=97 VPN=103 VPN=100 RRP VPN ±1 VPN=101 VPN=99 +1 VPN=101 VPN=101 RRP VPN -1 VPN=99 (RRP ) RRP (RRP) VPN+1 VPN-1 RRP VPN Buffer Refill Selector 13

22 VPN=100 VPN=101 VPN=99 VPN=102-1 VPN=100 VPN=99 VPN=101 VPN=98 RRP PTE VPN+1 VPN-1 RRP 14

23 Wide Range Support Wide Range Support WRS ±1 PTE WRS ±2 ±3 PTE WRS 3.1 WRS ±2 2WRS ±3 3WRS 3WRS VPN±2 VPN± WRS WRS TLB WRS TLB PPTE

24 WRS WRS PTE WRS ªªª u ªªª preload preload preload preload preload preload Last Hit PTE TLB ªªªªªªª TLBªªªªªªªª 3.1: Wide Range Support WRS WRS WRS 1PTE WRS PTE PTE WRS WRS PTE

25 BURST ADDRESS 2BIT OFFSET (1PTE=1WORD) Buffer Memory Address 00 «««««««««««««««« PTP + VPN = PTE Address ««««««««101 = 010 d ««««««««««««««««««««««««««««««««««««««««««« ªªª ªªªªu «««««««««««««« : WRS 17

26 WRS PTE 32Bit PTEA(Page Table Entry Address) PTP(Page Table Pointer) VPN 1 VPN PTE 4KB VPN 32bit 12bit =20bit 1PTE32 4byte 2 PTEA 4MB 2 1WORD (bit 8 3bit) WRS 8 4WRS 8+1=9 RRP VPN 8 ( ) RRP VPN +1 VPN+1 1 VPN SDRAM (VPN + ) (VPN ) 1 CPU 1 PTP VPN ( PTP 0 ) PTEA PTE 2 PTEA MIPSR4000 PTE PTE PTE 18

27 3.1: 8 2 Starting Address Addressing(Decimal) A2 A1 A , 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, , 2, 3, 4, 5, 6, 7, 0, 1, 0, 3, 2, 5, 4, 7, 6, , 3, 4, 5, 6, 7, 0, 1, 2, 3, 0, 1, 6, 7, 4, 5, , 4, 5, 6, 7, 0, 1, 2, 3, 2, 1, 0, 7, 6, 5, 4, , 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, , 6, 7, 0, 1, 2, 3, 4, 5, 4, 7, 6, 1, 0, 3, 2, , 7, 0, 1, 2, 3, 4, 5, 6, 7, 4, 5, 2, 3, 0, 1, , 0, 1, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1, 0, ?? VPN ± + VPN+5 VPN 2 ±4 WRS VPN±1PTE 3.3 RRP VPN Case A Case B ±1VPN RRP VPN PTEA 4WRS PTE PTE Case A 000 VPN PTE Case B 111 VPN PTE 8 WRS RRP VPN 1/4 RRP VPN PTEA PTE VPN+1 PTEA 19

28 Page Table Burst Address Ç Ç Ç Æ Ç Ç È Æ Ç P T E «ªªª ª ª «Case A:VPN-1 Case A:RRP VPN Case A:VPN+1 Case B:VPN-1 Case B:RRP VPN Case B:VPN+1 «ªªª ª ª «PTE È PTE Ç È Ç Ç Ç Ç È Ç : WRS 20

29 8 1/4 1/8 VPN 1 PTEA PTE WRS 2WRS 3.4 CPU WRS 16byte 2WRS 32 4 WRS 3 RRP VPN Register Burst Buffer RRP RRP VPN RRP WRS RRP VPN RRP WRS WRS RRP VPN Register 1 4 (Predictor Control Unit) (Address Generator) WRS 4 RRP PTE ( ) VPN PTE WRS PTE PTE 21

30 3.4: 2WRS 22

31 ªª ªªªªªª s ªª ªªª v ªªª ªªªª s ªªª ªªªªª ªª ªªªª ~ ªªª ªªª ªªªª ªªª ªªªªª ªª ªªª ªªª 3.5: 23

32 ªªªªªª ªªªªªª s ªªª ªª ª s ªªª ªªª ªªª ªª ªªª ªª s ªªªªª ªªª ªª ªªª ªªª ªªªªªª ªªªªªª ªªª ªªªªªª ªªª ªªª ªª ªªª ªªªªªª «ªªª ªªª ªªª ªª ªªª s ªªª s ªªª ªªª ªªª ªªªªª ªªª ªªª ªªª ªª ªªª ªªª ªª ªªª ªªª ªª ªªª ªªª ªª ªªª 3.6: 24

33 PTE Refill Selector Burst Controler Refill Selector RRP 25

34 3.2 Multiple Operand Support Multiple Operands Support MOS MOS 3.7 ªªª u ªªª preload preload ªªªªa ªªªªb ªªªªc +1 Last Hit PTE of a Last Hit PTE of b Last Hit PTE of c -1 ªªªªªªª TLBªªªªªªªª TLB 3.7: Multiple Operands Support MOS for(i=0;i<n;i++) a[i]=b[i]+c[i]; WRS b c a 26

35 5 MOS MOS 2 2MOS 3 3MOS 3.7 3MOS 3MOS 3 a b c MOS 1. MOS 2. MOS 3MOS MOS 3. MOS 4.LRU(Least Reacently Used) ( ) 1 MOS MOS 2 MOS MOS 3 5 SPARC CPU 27

36 RRP VPN TLB 4 LRU LRU 1 LRU 1 MOS 4 LRU MOS 4MOS 3.8 MOS LP 4MOS LP0 LP VPN MOS VPN Preliminary Buffer Check Active MOS Switcher VPN Preliminary Buffer Check MOS LRU LRU TLB LRU LRU LRU 4MOS 4bit 4=16bit 16bit 6 (LRU Bit Field) 4MOS nmos n 2 6 LRU (10 ) LRU LRU 2 MOS 28

37 3.8: 4MOS LRU LRU Controler LRU Bit Field MOS RRP ( VPN ) PTE MOS LP VPN Memory Access Request Que LP ACK LP 29

38 ªªª } ªªª ªª ªªª ªªªª ªªª e y ªª ª v ªªª ªª ª v ªªª ªªª ªªª ªª ªªª vƒ ªªª ªªª ªªª ªªª ªªª ªªª ªªª ªªª vƒ 3.9: MOS 30

39 VPN ACK MOS Control Unit ACK Memory Access Request Que MOS MOS 1 MOS MOS MOS MOS MOS 3.3 WRS MOS TLB TLB TLB WRS MOS TLB TLB TLB PPTE TLB TLB WRS MOS 1 4KB WRS ± PTE PPTE TLB WRS MOS 31

40 TLB WRS TLB WRS MOS WRS MOS TLB PPTE WRS MOS 3.10 preload Last Hit PTE of a Last Hit PTE of b Last Hit PTE of c u ªªª TLB ªªªªªªª TLBªªªªªªªª 3.10: WRS MOS 32

41 4 VHDL 4.1 Integer Integer Unit Unit Data Data TLB TLB Instruction Instruction TLB TLB TLB TLB Misshandler Misshandler Instruction TLB Instruction TLB Predictor Predictor Buffer Buffer Buffer Buffer Data TLB Data TLB Predictor Predictor 4.1: 3 TLB TLB 33

42 MMU(Memory Management Unit) TLB TLB MMU 4.1 [Integer Unit] CPU Integer Unit Integer Unit : Out-of-Order 5 32 IF(InstructionFetch) ID(Instruction Decode) EX(EXecution) MEM(MEMory access) WB(Write Back) MIPSI ID 34

43 EX ID TLB MMU MIPS R3000 A 4.2 MMU MMU Integer Unit MIPS MMU MIPS MIPS TLB MMU TLB MMU MMU TLB MIPS TLB TLBI(TLB Index) TLBR(TLB Read) TLBW(TLB Write) TLBP(TLB Probe) 0 MTC0(Move To Coprocessor 0) MFC0(Move From Coprocessor 0) 1 MMU MIPS TLB MIPS R3000 TLB 4.2 MIPS TLB MMU Reserved COP0 31 MMU and Predictor Status(MPSR) MMU and Predictor Status 4.4 MPSR 1 4 TA(TLB Active) TLB 1 0 TLB TF(TLB Flush) TLB 1 1 TLB Valid 1 MIPS CPU Integer Unit MIPS 0(COP0) 1(COP1) 35

44 4.3: MMU 36

45 4.1: 0 COP0 COP0 COP0 0 Index Reserved 1 Random Reserved 2 EntryLo0 Reserved 3 EntryLo1 Reserved 4 Context PTP Register 5 PageMask Reserved 6 Wired Reserved 8 BadVaddr Reserved 10 EntryHi Reserved 12 SR SR 13 Cause Reserved 15 PRID PRID 14 EPC Reserved 31 Reserved MMU and Predictor Status 4.2: MIPS R3000 TLB OFF & Cache ON kseg0 0xa xbfff ffff 0.5GB TLB OFF & Cache OFF kseg1 0x x9fff ffff 0.5GB TLB ON & Cache ON kseg2 0xc xffff ffff 1GB kuseg 0x x7fff ffff 2GB 37

46 MPSR PA 000ªªªªªª ªªªªªª000 PF TF TA TA : TLB Active TF : TLB Flush PF : Predictor Flush PA : Predictor Active 4.4: MPSR 0 Valid 0 TLB TLB 2 PF(Predictor Flush) Valid 0 Valid 0 PA(Predictor Active) 1 0 TLB TLB 1 TA 1 TLB VPN PTEA Dirty TLB TLB 2 MIPS TLB ASID TLB Flush ASID

47 4.2.2 TLB TLB MIPS TLB MIPS R3000 TLB TLB TLB : TLB 32 4KB TLB TLB MIPS Entry Hi Entry Lo Entry Hi MIPS R3000 Entry Hi VPN ASID 0 3 TLB Entry Hi VPN ASID Entry Lo MIPS R3000 Entry Lo 3 0 TLB 32 39

48 PFN(Physical Flame Number) 4 Attribute 0 PFN Attribute N D V G TLB Entry Lo PPN Attribute MIPS R3000 Attribute : MIPS R3000 Attribute N D V G 1 1 Write,1 TLB 1 Entry Hi ASID MIPS TLB Attribute TLB PTE MIPS Attribute TLB MMU TLB 4 PPN(Physical Page Number) MIPS R6000 PPN 40

49 TLB 4.6 ª 4.6: TLB TLB VPN PPN Attribute LRU LRU Dirty Valid TLB LRU 1 LRU 4KB VPN PPN Dirty 41

50 5 VHDL C 5.1 TLB 2.2 Synopsys FPGA Compiler II Xilinx FPGA VIRTEX2 2V6000FF1152 ( ) 20MHz Primitive reference count FPGA LUT(LookUp Table) LSI Timing Path Groups Clocks 5.2 Model Technology ModelSim C TLB TLB / MMU TLB MIPS TLB TLB 42

51 #define PAGE_SIZE 1024 /* 4KB/4 */ #define NUM_OF_TLB_MISS 5 int test_data[ PAGE_SIZE*NUM_OF_TLB_MISS ]; int dataset_access(); int main( void ){ register int a; a = dataset_access(); } int dataset_access(){ int i,sum; for ( i=0;i<page_size*num_of_tlb_miss;i++) test_data[ i ] = i; for ( i=0;i<page_size*num_of_tlb_miss;i++) sum += test_data[ i ]; } return sum; 5.1: 1 43

52 MMU 1WRS-1MOS 1WRS-4MOS 2WRS-1MOS CPU 1WRS-1MOS CPU 1WRS-4MOS CPU 2WRS-1MOS CPU CPU MMU 1WRS-1MOS CPU MMU 1WRS-1MOS 1WRS-4MOS CPU MMU 1WRS-1MOS 1WRS-4MOS 2WRS-1MOS CPU MMU 1WRS-1MOS 2WRS-1MOS (PIPELINE) MMU (MMU) 1WRS-1MOS (1WRS-1MOS) 1WRS-4MOS (1WRS-4MOS) 2WRS-1MOS (2WRS-1MOS) 44

53 6.1: FDE FD LUT XORCY TLB MUX CMP PIPELINE MMU WRS-1MOS WRS-4MOS WRS-1MOS Normal CPU WRS-1MOS CPU WRS-4MOS CPU WRS-4MOS CPU WRS-1MOS CPU CPU(Normal CPU) 1WRS-1MOS CPU(1WRS-1MOS CPU) 1WRS-4MOS CPU(1WRS-4MOS CPU) 2WRS-1MOS CPU(2WRS-1MOS CPU) 6.1 FDE D FD D LUT XORCY XOR TLB MUX CMP FPGA MUX CMP LUT CPU TLB TLB LUT FDE 1TLB FDE 1344 LUT 1551 TLB 1WRS-4MOS CPU 1WRS-4MOS CPU FDE LUT 2 FDE FD LUT XORCY 1WRS-4MOS CPU Normal CPU 1WRS-1MOS 3.5% 1WRS-4MOS 17.5% 2WRS-1MOS 5.0% 1WRS-1MOS 5.5% 1WRS-4MOS 28.5% 2WRS-1MOS 4.0% MOS 45

54 50% 80% FPU Normal CPU CPU TLB- CPU Normal CPU 1WRS-1MOS CPU 11.5% 1WRS-4MOS CPU 24.5% 2WRS-1MOS CPU 13.5% 1WRS-1MOS CPU 25.85% 1WRS-4MOS CPU 49.4% 2WRS-1MOS CPU 25.8% : In- RC(ns) RC- Out(ns) RC- RC(ns) (MHz) PIPELINE MMU WRS-1MOS WRS-4MOS WRS-1MOS Normal CPU WRS-1MOS CPU WRS-4MOS CPU WRS-1MOS CPU In- RC(ns) RC- Out(ns) RC- RC(ns) 3 PIPELINE 11.67ns CPU Normal CPU 1WRS-1MOS CPU 1WRS-4MOS CPU 2WRS-1MOS CPU 85.69MHz PIPELINE PIPELINE In- RC RC- Out RC- Out TLB Normal CPU 1WRS-1MOS CPU 1WRS-4MOS CPU 2WRS-1MOS CPU VPN 46

55 FPGA / FPGA Compiler

56 7 48

57 8 MMU TLB TLB TLB MMU TLB 49

58 A 50

59 8.1: 51

60 Synopsys Model Technology University Program 52

61 [1] Ashley Saulsbury, Fredrik Dahigren and Per Stenstrom: Recency-Based TLB Preloading Proceedings of the 27th annual international symposium on Computer architecture, Pages ,2000 [2] M.Talluri, S.Kong, M.D.Hill and D.A.Patterson: Tradeoffs in Supporing Two Page Sizes Proc of ISCA, pages , 1992 [3] Madhusudhan Talluri and Mark D.Hill Surpassing the TLB Performance of Superpages with Less Operating System Support In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages , [4] J. E. Smith, A Atudy of Branch Prediction Strategies, Proc of ISCA, pp , May,