Applied Mathematics and Mechanics Vol. 34 No. 9 Sep ISSN GPU Boltzmann *? GPU Boltz

Similar documents
T e = K 1 Φ m I 2 cosθ K 1 Φ m I cosθ 2 1 T 12 e Φ / 13 m I 4 2 Φ m Φ m 14 I 2 Φ m I 2 15 dq0 T e = K 2 ΦI a 2 16

34 7 S R θ Z θ Z R A B C D PTP θ t 0 = θ 0 θ t 0 = 0 θ t 0 = 0 θ t = θ θ t = 0 θ t = 0 θ t V max θ t a max 3 θ t A θ t t 0 t / V max a max A = 3 4 S S

DOI /j.cnki.cjhd MPS,,, , MLParticle-SJTU MLParticle-SJTU MLParticle-SJTU U661.1 A Numerical

TFP TFP HK TFP Hseh Klenow HK 9 8 TFP Aok TFP Aok 10 TFP TFP TFP TFP TFP HK TFP 1 Y Y CES θ Y 1 TFP HK θ = 1 θ


2 : 237.,. [6 7] (Markov chan Monte Carlo, MCMC). MCMC, [8 9].,,, [0 ].,, : ),,,.,, ; 2),,.,.,. : ),.,,. ; 2),.,,. ; 3), EM, EM,.,, EM, EM. K M,.,. A

基于词语关联度的查询缩略*

24 26,,,,,,,,, Nsho [7] Nakadokoro [8],,,, 2 (Tradtonal estmaton of mage Jacoban matrx), f(t 1 ) p(t 2 ) : f(t 1 ) = [f 1 (t 1 ), f 2 (t 1 ),, f m (t


! /. /. /> /. / Ε Χ /. 2 5 /. /. / /. 5 / Φ0 5 7 Γ Η Ε 9 5 /

Lake Pont Tower m m Fg 2 2 Schematc dagram of temporary supports 1 Fg 1 Whole structure A

2009 年第 54 卷第 12 期 : 1779 ~ csb.scichina.com SCIENCE IN CHINA PRESS,,, ;, , 200

Vol.39 No. 8 August 2017 Hyeonwoo Noh [4] boundng box PASCALV VOC PASCAL VOC Ctyscapes bt 8 bt 1 14 bt

2 199 Navier-Stokes { u t - v 2 u + u u + 1 p = f 1 ρ u = 0 ux y Case25CTA z= { u LC6 60% T n = 0 2 CAS Case25pre ux y z= 0 Case25post u = u x u

2 ( 自 然 科 学 版 ) 第 20 卷 波 ). 这 种 压 缩 波 空 气 必 然 有 一 部 分 要 绕 流 到 车 身 两 端 的 环 状 空 间 中, 形 成 与 列 车 运 行 方 向 相 反 的 空 气 流 动. 在 列 车 尾 部, 会 产 生 低 于 大 气 压 的 空 气 流

! Ν! Ν Ν & ] # Α. 7 Α ) Σ ),, Σ 87 ) Ψ ) +Ε 1)Ε Τ 7 4, <) < Ε : ), > 8 7

% GIS / / Fig. 1 Characteristics of flood disaster variation in suburbs of Shang

km km mm km m /s hpa 500 hpa E N 41 N 37 N 121

,!! #! > 1? = 4!! > = 5 4? 2 Α Α!.= = 54? Β. : 2>7 2 1 Χ! # % % ( ) +,. /0, , ) 7. 2

Ρ Τ Π Υ 8 ). /0+ 1, 234) ς Ω! Ω! # Ω Ξ %& Π 8 Δ, + 8 ),. Ψ4) (. / 0+ 1, > + 1, / : ( 2 : / < Α : / %& %& Ζ Θ Π Π 4 Π Τ > [ [ Ζ ] ] %& Τ Τ Ζ Ζ Π

&! +! # ## % & #( ) % % % () ) ( %

JOURNAL OF EARTHQUAKE ENGINEERING AND ENGINEERING VIBRATION Vol. 31 No. 5 Oct /35 TU3521 P315.

Microsoft Word - T 田新广.doc

!! )!!! +,./ 0 1 +, 2 3 4, # 8,2 6, 2 6,,2 6, 2 6 3,2 6 5, 2 6 3, 2 6 9!, , 2 6 9, 2 3 9, 2 6 9,

SVM OA 1 SVM MLP Tab 1 1 Drug feature data quantization table

Microsoft Word - 系统建设1.doc

Vol. 22 No. 4 JOURNAL OF HARBIN UNIVERSITY OF SCIENCE AND TECHNOLOGY Aug GPS,,, : km, 2. 51, , ; ; ; ; DOI: 10.

4= 8 4 < 4 ϑ = 4 ϑ ; 4 4= = 8 : 4 < : 4 < Κ : 4 ϑ ; : = 4 4 : ;

/ 28.52% / 29.54% / 24.18% (2). 1 / / /

f 2 f 2 f q 1 q 1 q 1 q 2 q 1 q n 2 f 2 f 2 f H = q 2 q 1 q 2 q 2 q 2 q n f 2 f 2 f q n q 1 q n q 2 q n q n H R n n n Hessian

g 100mv /g 0. 5 ~ 5kHz 1 YSV8116 DASP 1 N 2. 2 [ M] { x } + [ C] { x } + [ K]{ x } = { f t } 1 M C K 3 M C K f t x t 1 [ H( ω )] = - ω 2

/ Ν #, Ο / ( = Π 2Θ Ε2 Ρ Σ Π 2 Θ Ε Θ Ρ Π 2Θ ϑ2 Ρ Π 2 Θ ϑ2 Ρ Π 23 8 Ρ Π 2 Θϑ 2 Ρ Σ Σ Μ Π 2 Θ 3 Θ Ρ Κ2 Σ Π 2 Θ 3 Θ Ρ Κ Η Σ Π 2 ϑ Η 2 Ρ Π Ρ Π 2 ϑ Θ Κ Ρ Π

三维网格模型的骨架抽取

. 3. MOOC 2006 MOOC Automated Text Marker 2014 e-rater Yigal et al MOOC Coursera Edx 97

., /,, 0!, + & )!. + + (, &, & 1 & ) ) 2 2 ) 1! 2 2

作为市场化的人口流动

WL100014ZW.PDF

265 ant 8 ab a Dispensing b Dispensing c d c Flowing d Flowing Fig 1 e f LED e Curing 1 f Phosphor gel morphology affects the LED performance LED Sche

% % % % % % % % %

# # # #!! % &! # % 6 & () ) &+ & ( & +, () + 0. / & / &1 / &1, & ( ( & +. 4 / &1 5,

, ( 6 7 8! 9! (, 4 : : ; 0.<. = (>!? Α% ), Β 0< Χ 0< Χ 2 Δ Ε Φ( 7 Γ Β Δ Η7 (7 Ι + ) ϑ!, 4 0 / / 2 / / < 5 02

8 9 8 Δ 9 = 1 Η Ι4 ϑ< Κ Λ 3ϑ 3 >1Ε Μ Ε 8 > = 8 9 =

* CUSUM EWMA PCA TS79 A DOI /j. issn X Incipient Fault Detection in Papermaking Wa

41 10 Vol. 41, No ACTA AUTOMATICA SINICA October, ,, (Least square support vector machines, LS-SVM)., LS-SVM,,,, ;,,, ;,. DOI,,,,,

) Μ <Κ 1 > < # % & ( ) % > Χ < > Δ Χ < > < > / 7 ϑ Ν < Δ 7 ϑ Ν > < 8 ) %2 ): > < Ο Ε 4 Π : 2 Θ >? / Γ Ι) = =? Γ Α Ι Ρ ;2 < 7 Σ6 )> Ι= Η < Λ 2 % & 1 &

Β 8 Α ) ; %! #?! > 8 8 Χ Δ Ε ΦΦ Ε Γ Δ Ε Η Η Ι Ε ϑ 8 9 :! 9 9 & ϑ Κ & ϑ Λ &! &!! 4!! Μ Α!! ϑ Β & Ν Λ Κ Λ Ο Λ 8! % & Π Θ Φ & Ρ Θ & Θ & Σ ΠΕ # & Θ Θ Σ Ε

07-3.indd

I ln I V = α + ηln + εt, 1 V t, t, t, 1 t, 1 I V η η >0 t η η <0 η =0 22 A_,B_,C0_,C1_,C2_,C3_,C4_,C5_,C6_,C7_,C8_,C99_,D_,E_,F_,G_,H_,I_,J_,K_,L_,M_

Microsoft PowerPoint - aspdac_presentation_yizhu

!!! #! )! ( %!! #!%! % + % & & ( )) % & & #! & )! ( %! ),,, )

!! # % & ( )!!! # + %!!! &!!, # ( + #. ) % )/ # & /.

) ) ) )-. ) ) / )-. )-. )-. -. : -/ -0 0/.. ; -.0 : 0 ).- ; 0 ).=? 2 2 ) / / ) - ; ) ; )/ :.0/10)/ / 34 ; )/ 10. ; / 0 )

2 2 Λ ϑ Δ Χ Δ Ι> 5 Λ Λ Χ Δ 5 Β. Δ Ι > Ε!!Χ ϑ : Χ Ε ϑ! ϑ Β Β Β ϑ Χ Β! Β Χ 5 ϑ Λ ϑ % < Μ / 4 Ν < 7 :. /. Ο 9 4 < / = Π 7 4 Η 7 4 =

p 3 p 4 p 5 p 6 p 7 p 8 p 9 p 10 p 11 θ 1 θ 2 θ 3 θ 4 θ 5 θ 6 θ 7 θ 8 θ 9 θ d 1 = 0 X c 0 p 1 p 2 X c 0 d pi p j p i p j 0 δ 90

/MPa / kg m - 3 /MPa /MPa 2. 1E ~ 56 ANSYS 6 Hz (a) 一阶垂向弯曲 (b) 一阶侧向弯曲 (c) 一阶扭转 (d) 二阶侧向弯曲 (e) 二阶垂向弯曲 (f) 弯扭组合 2 6 Hz

LaDefense Arch Petronas Towers 2009 CCTV MOMA Newmark Hahn Liu 8 Heredia - Zavoni Barranco 9 Heredia - Zavoni Leyva

> # ) Β Χ Χ 7 Δ Ε Φ Γ 5 Η Γ + Ι + ϑ Κ 7 # + 7 Φ 0 Ε Φ # Ε + Φ, Κ + ( Λ # Γ Κ Γ # Κ Μ 0 Ν Ο Κ Ι Π, Ι Π Θ Κ Ι Π ; 4 # Ι Π Η Κ Ι Π. Ο Κ Ι ;. Ο Κ Ι Π 2 Η

8 9 < ; ; = < ; : < ;! 8 9 % ; ϑ 8 9 <; < 8 9 <! 89! Ε Χ ϑ! ϑ! ϑ < ϑ 8 9 : ϑ ϑ 89 9 ϑ ϑ! ϑ! < ϑ < = 8 9 Χ ϑ!! <! 8 9 ΧΧ ϑ! < < < < = 8 9 <! = 8 9 <! <

Microsoft Word - 3谢勇.doc

Dan Buettner / /

. /!Ι Γ 3 ϑκ, / Ι Ι Ι Λ, Λ +Ι Λ +Ι

# # 4 + % ( ) ( /! 3 (0 0 (012 0 # (,!./ %

~ ~

Force-Velocty Relatonshp of Vscous Dampers F D C u& sgn ( u& ) Lne : F D C N V, Nonlnear Damper wth < Lne : F D C L V, Lnear Damper Lnear Vscous Dampe

! # % & # % & ( ) % % %# # %+ %% % & + %, ( % % &, & #!.,/, % &, ) ) ( % %/ ) %# / + & + (! ) &, & % & ( ) % % (% 2 & % ( & 3 % /, 4 ) %+ %( %!

Ashdgsahgdh

08-02.indd

Vol The Workng Papers o RCEWCC 2004 Thel 2. GDP GDP GDP GDP GDP Per Capta GDP GDP GDP 2000 E X ( t) = X ( t0 ) X ( t) () ) X (t t0 t GDP )

《哈佛考考你·智力》

: 29 : n ( ),,. T, T +,. y ij i =, 2,, n, j =, 2,, T, y ij y ij = β + jβ 2 + α i + ɛ ij i =, 2,, n, j =, 2,, T, (.) β, β 2,. jβ 2,. β, β 2, α i i, ɛ i

ms JF12 1] ms.. ( ) ] 3] 4-5] 6-7]. ( ) Hz. 2. 8] ( ). ( ). 9-11] ]. ( ) 14].. 15].. (JF12) km 5 9

9!!!! #!! : ;!! <! #! # & # (! )! & ( # # #+

(,00);,, (,,00);,,,, (,00) (,, 00;,00),, (00) IPO, IPO,,,, ( ),,,, (Loughran,Rtter,00;Rtter,003), IPO,IPO, (Rtter,003;Jenknson et al.,006),, IPO,, 5%(

目 录 中 文 摘 要 1 英 文 摘 要 第 一 章 综 述 3 第 二 章 风 险 分 解 方 法 介 绍 6.1 风 险 分 解 预 备 知 识 6. 本 文 采 用 的 风 险 分 解 方 法 6 第 三 章 风 险 序 列 估 计 方 法 及 所 用 数 据 描 述 10 第 四 章 数

P. C Evelyn. M. Duvall 2 quality of life cabana

! # %& ( %! & & + %!, ( Α Α Α Α Χ Χ Α Χ Α Α Χ Α Α Α Α

= Υ Ξ & 9 = ) %. Ο) Δ Υ Ψ &Ο. 05 3; Ι Ι + 4) &Υ ϑ% Ο ) Χ Υ &! 7) &Ξ) Ζ) 9 [ )!! Τ 9 = Δ Υ Δ Υ Ψ (

,, , 1 (,2006) %, 1. 47,, %, 4. 5, 84 %(,2008a,2008b,2009),,,,, : %, %,?,,,,,,,,, (20

, 2 : ; 4 8, mm, mm, 43. 3% ; 350 mm, 70% , 32 d, d mm, 45. 1% mm, mm, 850 hpa (

( ) (! +)! #! () % + + %, +,!#! # # % + +!

4 # = # 4 Γ = 4 0 = 4 = 4 = Η, 6 3 Ι ; 9 Β Δ : 8 9 Χ Χ ϑ 6 Κ Δ ) Χ 8 Λ 6 ;3 Ι 6 Χ Δ : Χ 9 Χ Χ ϑ 6 Κ

TI 3 TI TABLE 4 RANDBIN Research of Modern Basic Education

Ansys /4 Ansys % 9 60% MU10 M m 1 Fig. Actual situation of measured building 1 Fig. 1 First floor plan of typical r

. 1 4 Web PAD

Microsoft Word 战玉丽C.doc


& & ) ( +( #, # &,! # +., ) # % # # % ( #

标题

untitled

CHIPS Oaxaca - Blinder % Sicular et al CASS Becker & Chiswick ~ 2000 Becker & Chiswick 196

[1] Nielsen [2]. Richardson [3] Baldock [4] 0.22 mm 0.32 mm Richardson Zaki. [5-6] mm [7] 1 mm. [8] [9] 5 mm 50 mm [10] [11] [12] -- 40% 50%

H 2 SO ml ml 1. 0 ml C 4. 0 ml - 30 min 490 nm 0 ~ 100 μg /ml Zhao = VρN 100% 1 m V ml ρ g

Microsoft Word 罗磊.doc

% % % % % % % % % 76


标题

(science demonstration phase) 2 2 l = 30 b = 0 l = 59 b = 0 5 PACS 70 µm 160 µm SPIRE 250 µm 350 µm 500 µm Hi-GAL GPU 2 GPU 3 GPU GPU

2005 3,? :; ;, ;,,,,,,1 % %,,,,, 1 %,,,, : () ;, ;,,,,,,,,,,,,, (2004) ( GBΠT ) 16 (2004), (2004) 47

1 119 Clark 1951 Martin Harvey a 2003b km 2

y 1 = 槡 P 1 1h T 1 1f 1 s 1 + 槡 P 1 2g T 1 2 interference 2f 2 s y 2 = 槡 P 2 2h T 2 2f 2 s 2 + 槡 P 2 1g T 2 1 interference 1f 1 s + n n

Transcription:

34 9 2013 9 15 Appled Mathematcs and Mechancs Vol 34 No 9 Sep 15 2013 1000-0887 2013 09-0956-09 ISSN 1000-0887 GPU Boltzmann *? 710049 GPU Boltzmann lattce Boltzmann method LBM GPU graphc processng unt sngle-nstructon multple-thread SIMT LBM LBM GPU DNS drect numercal smulaton 8 GPU 6 7 10 7 Δ + = 1 41 3 10 6 24 h Moser Boltzmann Boltzmann GPU DNS TB126 O351 DOI 10 3879 /j ssn 1000-0887 2013 09 009 A Boltzmann LBM Naver-Stokes 3 1 Boltzmann 2 Boltzmann 3-4 LBM Moser DNS = 180 DNS 5-6 Re τ * 2013-05-30 2013-06-05 11242010 11102150 1980 E-mal dngxu@ mal xjtu edu cn 1977 E-mal wangxan@ mal xjtu edu cn 956

957 Moser 5 DNS 128 3 Δ + = 4 4 Kolmogorov KMM DNS CPU DNS DNS Re DNS DNS Re GPU CUDA OpenCL GPU GPU GPGPU general-purpose graphcs processng unt GPGPU 2003 Nvda CUDA 7 GPU GPU Tesla Kepler GPU 2 688 3 95TFlops C Tesla K20 2 5 Naver-Stokes MAC marker and cell GPU 30 ~ 40 Boltzmann LBM 100 8-11 2006 GPU GPU 10 12 ~ 10 15 12-15 GPU GPU GPU LBM GPU Re τ = 180 LBM GPU 1 Boltzmann Boltzmann Boltzmann-BGK Boltzmann f t + e!f = - 1 λ f - f eq 1 f f eq λ e = 0 1 2 N - 1 N N N N = 9 D2Q9 D3Q13 D3Q15 D3Q19 D3Q27 N = 13 15 19 27 f eq [ ] = ρω 1 + 3e u + 9 2 e u 2-3 2 u u 2

958 GPU Boltzmann ω D3Q19 1 ω 0 = 1 /3 ω 1 ~ ω 6 = 1 /18 ω 7 ~ ω 18 = 1 /36 ρ u 1 Fg 1 D3Q19-LBM D3Q19-LBM model { f f eq = 0 = 0 ρu = N f e = N f eq e = 0 = 0 ρ = N = N 3 LBM c s = 1 / 槡 3 p = ρc 2 s = ρ /3 4 1 x t f x + e Δt t + Δt - f x t = - 1 τ f x t - f eq x t 5 LBGK τ = λ /Δt ν = 1 ( τ - 2 ) Δt 6 5 Δt = 1 5 f - x t = f x t - 1 τ f x t - f eq x t 7 f x + e t + 1 = f - x t f f - f eq f - = f eq + f neq = f eq + f neq _neghbor u ρ 2 f neq f neq _neghbor f - _neghbor f eq _neghbor = f eq + f - _neghbor - f eq _neghbor 7 2 8 9 LBE 16 f eq 2 ρu eq = ρu + τg 10 G u eq u 2 f eq 2 NS Boltzmann GPU NSE LBM GPU 1 024 512 GPU GeForce GTX280 sngle CPU Intel Xeon E5420 2 5 GHz NSE Red-Black 1 2 1 GPU 13 7 CPU Posson GPU 82% CPU 57% NS GPU Posson GPU advecton-dffuson GPU 8 8% CPU 36%

959 GPU 2 LBM GPU 87 4 7 124 2 8 62 9 2 4 28 + 1 87 4 28 s 1 87 s LBM GPU LBM-GPU Table 1 1 024 512 1 NSE Elapsed tme performance and speed-up of CPU & GPU on smulatng 2D flow around a cylnder by NSE elapsed tme t /ms CPU performance P GFlops elapsed tme t /ms GPU performance P GFlops speed-up overall 282 12 1 24 20 65 16 89 13 7 advecton-dffuson 102 43 36% 1 15 1 83 8 8% 64 23 56 0 dvergence U 3 97 0 79 0 18 17 14 21 7 Posson 160 41 57% 1 32 16 94 82% 12 49 9 5 gradent p 6 48 0 65 0 26 16 03 24 8 S Table 2 2 LBE Elapsed tme performance and speed-up of CPU & GPU on smulatng 2D flow around a cylnder by LBE 1 024 512 elapsed tme t /s CPU performance P GFlops elapsed tme t /s GPU performance P GFlops speed-up overall 1 345 7 0 57 15 40 50 34 87 4 collson ncludng BC 763 6 0 88 6 15 4 28 + 1 87 109 10 124 2 streamng ncludng macro value computaton 582 1 0 18 9 25 11 30 62 9 3 2 x y z L x L z δ Reynolds Reynolds Re c = δu c /ν u c = ν /u τ u τ = 槡 Gδ / ρ t τ = l τ /u τ Reynolds Re τ δ /l τ = δ + Re c Re τ l τ = δu τ /ν Re τ = + Jmenez 17 Reynolds a x = L x /δ a z = L z /δ 17 Spasov 18 LBM Re τ = 180 a x = 4 a z = 1 Moser 5 Reynolds a x = 8 a z = 2 DNS Δ Kolmogorov η η + = 1 5 19 η + LBM S L y

960 GPU Boltzmann Δ + < η + = 1 5 LBM Δ = 1 Δ + = Δ /l τ = Re τ /δ < 1 5 δ > Re τ /1 5 Re τ = 180 δ > 120 Re τ = 180 L x L y L z 1 024 256 256 Δ + = 1 41 u τ u c /u τ = 1 /κ ln Re τ + b κ von K rm n κ 0 4 b 6 u c = 0 1 u = u - + u ' v = v - + v ' w = w - + w ' ρ = ρ 0 0 1 v - = w - = 0 u - u + = y + y + y w { ln y + /κ + b y + > y w u + u + = u - /u τ y w = 11 6 = 1 u ' = v ' = w ' = u c r rand - 0 5 r rand u c 11 9 z 16 5t f = a x δ /u τ 5t f 2 10 6 t f 3 10 6 LBM 3 2 3 Fg 2 Calculaton model for flow between plates Fg 3 Profle of average velocty wth tme at the central plane 4 LBM GPU Boltzmann LBM GPU 130 LBM D3Q19 3 G GPU 10 7 6 7 10 7 7 GPU GPU GPU CPU LBM GPU GPU 11 8 NVIDIA Tesla M2050 GPU MPI message passng nterface CUDA CudaMemcpy GPU GPU /CPU 11 GPU 10 x y z 1 /4 y x-z y GPU 1 024 32 256 4 D3Q19 11

961 CPU LBM 7 8 GPU-LBM 8 30% Fg 4 4 One-dmensonal doman decomposton 5 DNS 3 10 6 LBM 8 GPU 24 h 2 33 10 9 18 36 CPU 5 60 10 6 18 416 5 6 x-z u y DNS Moser DNS 5 6 = 4 a z = 1 a x = 8 a z = 1 Moser a x = 4 a z = 2 z a x 5 6 y Fg 5 Iso-surface of second nvarant Fg 6 Profle of average velocty n of velocty gradent tensor the normal y-drecton 7 Reynolds R uv y 7 a u rms y 7 b ~ d Moser DNS 5 7 a x = 4 a z = 1 a x = 8 a z = 1 a x = 4 a z = 2 Moser

962 GPU Boltzmann a R uv y = u ' v ' /u 2 τ b u rms y = u ' /u τ Fg 7 c v rms y = v ' /u τ 7 d w rms y = w ' /u τ Reynolds R uv y u rms y Profles of Reynolds stress R uv y and rms of velocty fluctuatons u rms y n the normal y-drecton 6 Boltzmann 1 DNS Moser DNS Boltzmann 2 GPU Boltzmann DNS 24 h 0 67 300 LBM 2 330MLUPS 2 10 9 LBM 3 NSE LBE GPU LBM GPU CFD DNS References 1 Chen S Y Doolen G D Lattce Boltzmann method for flud flow s J Annual Revew of Flud Mechancs 1998 30 329-364

963 2 Boltzmann M 2009 HE Ya-lng WANG Yong LI Qng Lattce Boltz m ann Method Theory and Applcatons M Bejng Scence Press 2009 n Chnese 3 Yu H Grmaj S S Luo L S DNS and LES of decayng sotropc turbulence w th and w thout frame rotaton usng lattce Boltzmann method J Journal of Com putatonal Physcs 2005 209 2 599-616 4 Yu H Luo L S Grmaj S S LES of turbulent square jet flow usng an MRT lattce Boltzmann model J Com puters & Fluds 2006 35 8 /9 957-965 5 Moser R D Km J Mansour N N Drect numercal smulaton of turbulent channel flow up to Re τ = 590 J Phys Fluds 1999 11 4 943-945 6 Km J Mon P Moser R D Turbulence statstcs n fully developed channel flow at low Reynolds number J J Flud Mech 1987 177 133-166 7 Nvda NVIDIA CUDA Programmng Gude K Verson 2 0 2008 8 Ogaw a S Aok T GPU computng for 2-dmensonal ncompressble-flow smulaton based on mult-grd method C / /Transactons of JSCES Paper No 20090021 2009 9 Harada T Smoothed partcle hydrodynamcs on GPUs C / /Proceedng of the Sprng Conference on Com puter Graphcs 2007 235-241 10 Rossnell D Bergdorf M Cottet G-H Koumoutsakosa P GPU accelerated smulatons of bluff body flow s usng vortex partcle methods J Journal of Com putatonal Physcs 2010 229 9 3316-3333 11 Wang X Aok T Mult-GPU performance of ncompressble flow computaton by lattce Boltzmann method on GPU cluster J Parallel Com putng 2011 37 9 521-535 12 Shmokaw abe T Aok T Takak T Endo T Yamanaka A Maruyama N Nukada A Matsuoka S Peta-scale phase-feld smulaton for dendrtc soldfcaton on the TSUBAME 2 0 supercomputer C / /Proceedngs of 2011 Internatonal Conference for Hgh Perform ance Com - putng Netw orkng Storage and Analyss New York USA 2011 13 Shmokaw abe T Aok T Ishda J Kaw ano K Muro C 145 TFlops performance on 3990 GPUs of TSUBAME 2 0 supercomputer for an operatonal w eather predcton C / /Proceedngs of the Internatonal Conference on Com putatonal Scence ICCS 2011 2011 4 1535-1544 14 Wang X Aok T Hgh performance computaton by mult-node GPU cluster-tsubame 2 0 on the ar flow n an urban cty usng lattce Boltzmann method J Internatonal Journal of Aerospace and Lghtw eght Structures 2012 2 1 77-86 15 Mk T Wang X Aok T Ima Y Ishkaw a T Takase K Yamaguch T Patent-specfc modelng of pulmonary ar flow usng GPU cluster for the applcaton n medcal practce J Com puter Methods n Bom echancs and Bom edcal Engneerng 2012 15 7 771-778 16 Lammers P Beronov K N Volkert R Brenner G Durst F Lattce BGK drect numercal smulaton of fully developed turbulence n ncompressble plane channel flow J Com puters & Fluds 2006 35 10 1137-1153 17 Jmenez J Mon P The mnmal flow unt n near-w all turbulence J Journal of Flud Mechancs 1991 225 1 213-240 18 Spasov M Rempfer D Mokhas P Smulaton of turbulent channel flow w th an entropc lattce Boltzmann method J Int J Num er Meth Fluds 2009 60 11 1240-1258 19 Pope S B Turbulent Flow s M Cambrdge Cambrdge Unversty Press 2000

964 GPU Boltzmann Drect Numercal Smulaton of the Wall-Bounded Turbulent Flow by Lattce Boltzmann Method Based on Mult-GPU XU Dng CHEN Gang WANG Xan LI Yue-mng State Key Laboratory for Strength and Vbraton of Mechancal Structures School of Aerospace X an Jaotong Unversty X an 710049 P R Chna Abstract The w all-bounded turbulent flow w as smulated drectly DNS by lattce Boltzmann method LBM through mult-gpu parallel computng The Data-parallel SIMT sngle-nstructon multple-thread characterstc of GPU matched the parallelsm of LBM w ell w hch led to hgh effcency of GPU on the LBM solver At the same tme t brought possblty for largescale DNS on the desk-top supercomputer In ths DNS w ork 8 GPUs w ere adopted The number of meshes of 6 7 10 7 w hch resulted n a non-dmensonal mesh sze of Δ + = 1 41 for the w hole soluton doman It took only 24 hours for the GPU-LBM solver to smulate 3 10 6 LBM steps As a result both the mean velocty and turbulent varables such as Reynolds stress and velocty fluctuatons agree w ell w th the results of Moser et al The capacty and valdty of LBM n smulatng turbulent flow are verfed Key words lattce Boltzmann method mult-gpu parallel computng w all-bounded turbulent flow DNS Foundaton tem The Natonal Natural Scence Foundaton of Chna 11242010 11102150