Microsoft PowerPoint - CEM-07-Parallel.pptx

Similar documents
自由軟體教學平台

Microsoft PowerPoint - Aqua-Sim.pptx

<4D F736F F F696E74202D20C8EDBCFEBCDCB9B9CAA6D1D0D0DEBDB2D7F92E707074>

软件测试(TA07)第一学期考试

Learning Java

1.ai

ebook140-9

Microsoft PowerPoint - Performance Analysis of Video Streaming over LTE using.pptx

目次 

VASP应用运行优化

ebook140-8

Windows 2000 Server for T100

投影片 1

CC213

(Pattern Recognition) 1 1. CCD

Logitech Wireless Combo MK45 English

概述

IP TCP/IP PC OS µclinux MPEG4 Blackfin DSP MPEG4 IP UDP Winsock I/O DirectShow Filter DirectShow MPEG4 µclinux TCP/IP IP COM, DirectShow I

Some experiences in working with Madagascar: installa7on & development Tengfei Wang, Peng Zou Tongji university

<4D F736F F D205F FB942A5CEA668B443C5E9BB73A740B5D8A4E5B8C9A552B1D0A7F75FA6BFB1A4ACFC2E646F63>

國立中山大學學位論文典藏.PDF

epub83-1

13 A DSS B DSS C DSS D DSS A. B. C. CPU D. 15 A B Cache C Cache D L0 L1 L2 Cache 16 SMP A B. C D 17 A B. C D A B - C - D

高 职 计 算 机 类 优 秀 教 材 书 目 * 序 号 书 号 (ISBN) 书 名 作 者 定 价 出 版 / 印 刷 日 期 ** 配 套 资 源 页 码 计 算 机 基 础 课 计 算 机 应 用 基 础 刘 升 贵 年 8 月

CH01.indd

27 :OPC 45 [4] (Automation Interface Standard), (Costom Interface Standard), OPC 2,,, VB Delphi OPC, OPC C++, OPC OPC OPC, [1] 1 OPC 1.1 OPC OPC(OLE f

國立中山大學學位論文典藏.PDF

Microsoft Word 記錄附件

OSI OSI 15% 20% OSI OSI ISO International Standard Organization 1984 OSI Open-data System Interface Reference Model OSI OSI OSI OSI ISO Prototype Prot

K7VT2_QIG_v3

WTO

Chapter 2

Oracle 4

热设计网

A Preliminary Implementation of Linux Kernel Virus and Process Hiding

1505.indd

Cadence SPB 15.2 VOICE Cadence SPB 15.2 PC Cadence 3 (1) CD1 1of 2 (2) CD2 2of 2 (3) CD3 Concept HDL 1of 1

第7章-并行计算.ppt

<4D F736F F D C4EAC0EDB9A4C0E04142BCB6D4C4B6C1C5D0B6CFC0FDCCE2BEABD1A15F325F2E646F63>

Microsoft Word - 01李惠玲ok.doc

The Development of Color Constancy and Calibration System

Microsoft Word - 生活禮儀柯友惠981

1 SQL Server 2005 SQL Server Microsoft Windows Server 2003NTFS NTFS SQL Server 2000 Randy Dyess DBA SQL Server SQL Server DBA SQL Server SQL Se

本科毕业设计(论文)工作细则&撰写规范

Windows XP

Guide to Install SATA Hard Disks

Panaboard Overlayer help

Thesis for the Master degree in Engineering Research on Negative Pressure Wave Simulation and Signal Processing of Fluid-Conveying Pipeline Leak Candi

Microsoft PowerPoint - ATF2015.ppt [相容模式]

P4VM800_BIOS_CN.p65

A dissertation for Master s degree Metro Indoor Coverage Systems Analysis And Design Author s Name: Sheng Hailiang speciality: Supervisor:Prof.Li Hui,

目 录 第 一 章 电 力 行 业 内 部 控 制 操 作 指 南 概 述... 1 第 二 章 内 部 控 制 规 范 体 系 建 设 与 运 行 第 三 章 内 部 环 境 建 设 第 一 节 组 织 架 构 第 二 节 发 展 战 略 第 三 节

EK-STM32F

1.3

资源管理软件TORQUE与作业调度软件Maui的安装、设置及使用

mvc

1 CPU

Microsoft Word - 专论综述1.doc

1.第二卷第二期p1

untitled

报 告 1: 郑 斌 教 授, 美 国 俄 克 拉 荷 马 大 学 医 学 图 像 特 征 分 析 与 癌 症 风 险 评 估 方 法 摘 要 : 准 确 的 评 估 癌 症 近 期 发 病 风 险 和 预 后 或 者 治 疗 效 果 是 发 展 和 建 立 精 准 医 学 的 一 个 重 要 前

《嵌入式系统设计》教学大纲

BC04 Module_antenna__ doc

Microsoft Word - TIP006SCH Uni-edit Writing Tip - Presentperfecttenseandpasttenseinyourintroduction readytopublish

40 COMMEMORATING THE FORTIETH ANNIVERSARY OF REFORM AND OPENING UP ( ) ( ) [1] :


并行程序设计基础

南華大學數位論文

《路得記》4章 1-23節

Microsoft Word - template.doc

豐佳燕.PDF

資訊教育總藍圖(公聽會草案)

Public Projects A Thesis Submitted to Department of Construction Engineering National Kaohsiung First University of Science and Technology In Partial

/ 理 论 研 讨 /Theoretical Discussion 的 样 子 其 次, 残 疾 人 有 接 受 教 育 的 能 力 据 专 家 介 绍, 一 个 人 除 非 是 植 物 人, 都 有 学 习 和 劳 动 的 能 力, 这 是 人 与 生 俱 来 的 天 性 前 些 年, 香 港 理

Microsoft Word - PS2_linux_guide_cn.doc

AL-M200 Series

Microsoft PowerPoint - Sens-Tech WCNDT [兼容模式]

編輯要旨 一 教育部為了協助本國失學民眾 新住民及 其他國外朋友 有系統的學習華語文的 聽 說 讀 寫 算等識字能力及跨文化 適應 以培養具有基本公民素養的終身學 習者 特別委託新北市政府教育局新住民 文教輔導科團隊編輯本教材 二 依據上述目的 本教材共有六冊 並分為 六級 分級及單元名稱詳如下表

Microsoft PowerPoint - ACA 專案簡報玄奘大學 ppt

P4V88+_BIOS_CN.p65

LSI U320 SCSI卡用户手册.doc

國家圖書館典藏電子全文

Abstract arm linux tool-chain root NET-Start! 2

User’s Manual

Transcription:

Parallel Scientific Computing by Computer Cluster Jiun-Hwa Lin Department of Electrical Engineering National Taiwan Ocean University

Outline Introduction Simple Cluster Setup Real Examples at NTOU Conclusions 2

What is Parallel Computing? Solve problems collaboratively and simultaneously by a bunch of processors. Processors are interconnected. While doing their own work, they need to talk to each other. from Dept. of Comput.Science & Information Management Providence Univ. 3

Why Parallel Computing? Solve larger and more complex problems Grand Challenge Problems 4

Large-scale First-Principles Simulations of Shocks in Deuterium http://www.llnl.gov/asci/ 5

ASCI White The simulation involved 1320 atoms and ran for several days on 2640 processors of ASCI White. 512 node SMP (16 CPUs/node) Peak speed 12+ TeraOP/s 6

High-Resolution Simulations of Global Climate (~300 km resolu.) (~75 km resolu.) (~50 km resolu.) 7

High-Resolution Simulations of Global Climate performing a series of global climate simulations using the NCAR CCM3 atmospheric model 8

More Applications from Dept. of Comput.Science & Information Management Providence Univ. 9

Why Parallel Computing? Researches that demand high-performance computing Nano-scale Electronics Computational Chemistry Aerospace Molecular Modeling Computational Electromagnetics Computational Acoustics Computational Fluid Dynamics Seismic Wave Propagation Plasma Physics And other 10

What is Computer Cluster? Poor men s supercomputer commodity-based cluster system designed as a cost-effective alternative to large supercomputers 11

PC Cluster M 2 COTS (Mass-Market Commodity-Off-The-Shelf) based systems hook up PCs or workstations from KAOS Univ. Kentucky 12

Beowulf-Class Systems Beowulf was the legendary sixth-century hero from a distant realm who freed the Danes of Heorot by destroying the oppressive monster. from Eric Fraser 13

Beowulf PC Clusters As a metaphor, Beowulf has been applied to a new strategy in high performance computing that exploits mass-market technologies to overcome the oppressive costs in time and money of supercomputing, thus freeing scientists, engineers, and others to devote themselves to their respective disciplines. from http://www.heorot.dk from Störtebeker Cluster Project 14

Beowulf PC Clusters Beowulf, both in myth and reality, challenges and conquers a dominant obstacles, in their respective domains, thus opening the way to future development. rank3 rank2 rank1 rank0 server hub or switch 15

Why Computer Cluster Low cost High performance Configurability Scalability High Availability from Störtebeker Cluster Project 16

How to Parallel Compute Processors PC, workstations, multi-cpu, SMP, DMP, Athlon 64 64-bit Itanlium 64-bit PowerPC G5 Dual CPU M.B. from ASUS 17

SMP Shared-Memory Multiprocessor System from Dept. of Comput.Science & Information Eng. Tunghai Univ. 18

DMP Distributed Memory Multiprocessor System from Dept. of Comput.Science & Information Eng. Tunghai Univ. 19

How to Parallel Compute Interconnecting networking, switch, Ethernet (10M bps) Fast Ethernet (100M bps) Gigabit Ethernet (1G bps) Myrinet (2G bps) 20

How to Parallel Compute Softwares O.S., languages, library, algorithms, compilers, from Dept. of Comput.Scienc e & Information Eng. Tunghai Univ.

How to Parallel Compute Logical view of cluster systems from Dept. of Comput.Science & Information Eng. Tunghai Univ. 22

Linux Free & Stable Software is like sex, it's better when it's free Linus Torvalds. Tux 23

Simple Cluster Set Up Recipe Hardware Configuration - Intel Pentium 3 500MHz - 512MB SDRAM - IDE Hard Disk - Fast Ethernet Interface Cards 100Mbps - Category 5 cables - Hub or Switch(100Mbps) - Monitor, Keyboard & Mouse PC Cluster in EMLAB at NTOU-EE 24

Simple Cluster Set Up Recipe Software Configuration - Operating System RedHat Linux 7.2 (kernel 2.4.7-10) - Compilers gcc-g++-2.96 - Parallel Interface MPICHP4-1.2.4 25

5.3 Directions 5.3.1 First Step - install Linux on each of the PCs - edit the file /etc/hosts on each of the 4 PCs 192.168.1.1 octopus0.ee.ntou.edu.tw octopus0 192.168.1.2 octopus1.ee.ntou.edu.tw octopus1 192.168.1.3 octopus2.ee.ntou.edu.tw octopus2 192.168.1.4 octopus3.ee.ntou.edu.tw octopus3 EM-LAB 國立臺灣海洋大學電機工程學系暨研究所 Department of Electrical Engineering National Taiwan Ocean University

5.3.2 Second Step - edit the file /etc/hosts.equiv on each of the 4 PCs octopus0 octopus1 octopus2 octopus3 - This is to configure the computers so that MPICH s P4 device may be used to execute a distributed parallel application. EM-LAB 國立臺灣海洋大學電機工程學系暨研究所 Department of Electrical Engineering National Taiwan Ocean University

5.3.3 Third Step - on the server node, make a directory /home/mpi_mirror. Configure the server to be an NFS server, and in /etc/exports add this line: /home/mirror ocotpus0(rw) octopus1(rw) ocotpus2(rw) octopus3(rw) EM-LAB 國立臺灣海洋大學電機工程學系暨研究所 Department of Electrical Engineering National Taiwan Ocean University

5.3.4 Forth Step - on the other(non-server ) nodes, make a directory /home/mirror. Add this line to /etc/fstab: octopus0:/home/mirror /home/mirror nfs rw.bg.soft 0 0 - This exports the directory /home/mirror from the server and mounts it on each of the clients for easy distribution of software between the nodes. -On the server node, install MPICH EM-LAB 國立臺灣海洋大學電機工程學系暨研究所 Department of Electrical Engineering National Taiwan Ocean University

5.3.5 Fifth Step - For each user that you create on the clusters, it is advised that you create a subdirectory owned by that user in the /home/mirror directory, such as /home/mirror/ryjou, where the user can put MPI programs and shared data files. EM-LAB 國立臺灣海洋大學電機工程學系暨研究所 Department of Electrical Engineering National Taiwan Ocean University

5.4 Installing Mpich-1.2.4 5.4.1 First Step - Downloading MPICH www.mcs.anl.gov/mpi/mpich/download.html & ftp.mcs.anl.gov in directory pub/mpi. Get mpich.tar.gz. - Unpack mpich.tar.gz % cd /tmp % tar zxovf mpich.tar.gz - If tar does not accept z option, use % cd /tmp % gunzip c mpich.tar.gz tar zxovf - EM-LAB 國立臺灣海洋大學電機工程學系暨研究所 Department of Electrical Engineering National Taiwan Ocean University

5.4.2 Second Step - Configuration directory: /usr/local/mpich-1.2.4/ %./configure prefix=/usr/local/mpich-1.2.4 & tee c.log -Making % make & tee make.log - Running examples % cd examples/basic % make cpi %../../bin/mpirun np 4 cpi EM-LAB 國立臺灣海洋大學電機工程學系暨研究所 Department of Electrical Engineering National Taiwan Ocean University

5.4.3 Third Step - Installing (root) % make install - Setting path edit /home/ryjou/.cshrc......... setenv PATH /usr/sbin:/sbin:${path}......... set path = ($path /usr/local/mpich-1.2.4/bin) EM-LAB 國立臺灣海洋大學電機工程學系暨研究所 Department of Electrical Engineering National Taiwan Ocean University

5.4.4 Forth Step - check the installation % source.cshrc % rehash % which mpirun /usr/local/mpich-1.2.4/bin/mpirun - setting machines used: edit /usr/local/mpich-1.2.4/share/machines.linux octopus0.ee.ntou.edu.tw octopus1.ee.ntou.edu.tw octopus2.ee.ntou.edu.tw octopus3.ee.ntou.edu.tw EM-LAB 國立臺灣海洋大學電機工程學系暨研究所 Department of Electrical Engineering National Taiwan Ocean University

5.5 Compiling, Linking, & running program At directory: /home/ryjou/pmlfma - Compiling % mpicc c mmtps.cpp - Linking % mpicc o mmtps mmtps.o - Compiling & Linking in a single command % mpicc o mmtps mmtps.cpp At directory: /home/mirror/ryjou/pmlfma - Running % mpirun np 4 mmtps mmtps have to be copied to this directory first EM-LAB 國立臺灣海洋大學電機工程學系暨研究所 Department of Electrical Engineering National Taiwan Ocean University

5.6 Some Mpi Statements - #include mpi.h 5.6.1 Basic - #include <mpi++.h > - int main(int argc, char *argv[]) - void MPI::Init(int& argc, char**& argv) - void MPI::Finalize() - MPI::Intracomm::Bcast(void* buffer, int count, const Datatype& datatype, int root) const - MPI::Intracomm::Reduce(const void* sendbuf, void* recvbuf, int count, const Datatype& datatype, const Op& op, int root) const EM-LAB 國立臺灣海洋大學電機工程學系暨研究所 Department of Electrical Engineering National Taiwan Ocean University

5.6.2 Some Statements Used for communication in NTOU PMLFMA - void Intracomm::Allgatherv(const void* sendbuf, int sendcount, const Datatype& sendtype, void* recvbuf, const int recvcounts[], const int displs[], const Datatype& recvtype) const) Gathers data from all tasks and deliver it to all - void Intracomm::Barrier() const Blocks until all process have reached this routine - Request Comm::Irecv(void* buf, int count, const Datatype& datatype, int source, int tag) const Begins a nonblocking receive - Request Comm::Isend(const void* buf, int count, const Datatype& datatype, int dest, int tag) const Starts a nonblocking send EM-LAB 國立臺灣海洋大學電機工程學系暨研究所 Department of Electrical Engineering National Taiwan Ocean University

Real Examples in EM Filed NTOU s PMLFMA EM wave inducted current Scattered EM wave conductor conductor 38

Multilevel Fast Multipole Method (MLFMA) Enclose the object in a cube. Each subcube is recursively divided into smaller subcubes until the subcube length is 0.5. Divide the cube into 8subcubes. Retain the nonempty cubes in the whole oct-tree structure.

Triangular Patches Modeling Objects Unknowns=120182 Incident frequency=0.9ghz

Current Distribution

RCS

Real Examples in EM Filed CPU time for EM-LAB PC Cluster on Linux CPU time(s) Unknowns 60,165 120,182 Pre-Iteration(sec) 4,249 52,486 Iteration(sec) 30,762 12,686 Each Iteration(sec) 67.6 111.2 Total(sec) 35,011 65,172 43

Real Examples in EM Filed Memory Requirement for EM-LAB PC Cluster on Linux Estimated Unknowns 60,165 120,182 243,706 Required Memory(KB)/node 91,960 388,884 1,000,000 44

Real Examples in EM Filed NTOU s PFDTD 三維 FDTD 空間的配置圖包含 PML 各台 PC 所負責處理的區塊 45

Real Examples in EM Filed 邊緣上的資料分送方法 (1) 邊緣上的資料分送方法 (2) 46

Conclusions What are required of users MPI Parallel algorithms Knowledge in parallel computing Cluster computing system are rapidly becoming the standard platforms for high-performance computing. Message-passing programming is the most obvious approach to take advantage of clustering performance. New trends in hardware and software technologies are likely to make clusters more promising. 47

GPGPU Generous-Purpose Graphics Processing Unit 現代的顯示晶片已經具有高度的可程式化能力, 由於顯示晶片通常具有相當高的記憶體頻寬, 以及大量的執行單元, 因此開始有利用顯示晶片來幫助進行一些計算工作的想法, 即 GPGPU CUDA (Compute Unified Device Architecture) 即是 NVIDIA 的 GPGPU 模型 NVIDIA 的新一代顯示晶片, 包括 GeForce 8 系列及更新的顯示晶片都支援 CUDA NVIDIA 免費提供 CUDA 的開發工具 ( 包括 Windows 版本和 Linux 版本 ) 程式範例 文件等等, 可以在 CUDA Zone 下載 48

GPGPU 的優缺點 使用顯示晶片來進行運算工作, 和使用 CPU 相比, 主要有幾個好處 : 顯示晶片通常具有更大的記憶體頻寬 例如,NVIDIA 的 GeForce 8800GTX 具有超過 50GB/s 的記憶體頻寬, 而目前高階 CPU 的記憶體頻寬則在 10GB/s 左右 顯示晶片具有更大量的執行單元 例如 GeForce 8800GTX 具有 128 個 "stream processors", 時脈為 1.35GHz CPU 時脈通常較高, 但是執行單元的數目則要少得多 和高階 CPU 相比, 顯示卡的價格較為低廉 例如目前一張 GeForce 8800GT 包括 512MB 記憶體的價格, 和一顆 2.4GHz 四核心 CPU 的價格相若 49

GPGPU 的優缺點 使用顯示晶片也有它的一些缺點 : 顯示晶片的運算單元數量很多, 因此對於不能高度平行化的工作, 所能帶來的幫助就不大 顯示晶片目前通常只支援 32 bits 浮點數, 且多半不能完全支援 IEEE 754 規格, 有些運算的精確度可能較低 目前許多顯示晶片並沒有分開的整數運算單元, 因此整數運算的效率較差 顯示晶片通常不具有分支預測等複雜的流程控制單元, 因此對於具有高度分支的程式, 效率會比較差 目前 GPGPU 的程式模型仍不成熟, 也還沒有公認的標準 例如 NVIDIA 和 AMD/ATI 就有各自不同的程式模型 50

GPGPU 的優缺點 整體來說, 顯示晶片的性質類似 stream processor, 適合一次進行大量相同的工作 CPU 則比較有彈性, 能同時進行變化較多的工作 51

CUDA 架構 CUDA 是 NVIDIA 的 GPGPU 模型, 它使用 C 語言為基礎, 可以直接以大多數人熟悉的 C 語言, 寫出在顯示晶片上執行的程式, 而不需要去學習特定的顯示晶片的指令或是特殊的結構 在 CUDA 的架構下, 一個程式分為兩個部份 :host 端和 device 端 Host 端是指在 CPU 上執行的部份, 而 device 端則是在顯示晶片上執行的部份 Device 端的程式又稱為 "kernel" 通常 host 端程式會將資料準備好後, 複製到顯示卡的記憶體中, 再由顯示晶片執行 device 端程式, 完成後再由 host 端程式將結果從顯示卡的記憶體中取回 52

CUDA 架構 53