GreenLA: Green Linear Algebra Software for GPU-Accelerated Heterogeneous Computing Jieyang Chen, Li Tan, Panruo Wu, Dingwen Tao, Hongbo Li, Xin Liang,

Size: px
Start display at page:

Download "GreenLA: Green Linear Algebra Software for GPU-Accelerated Heterogeneous Computing Jieyang Chen, Li Tan, Panruo Wu, Dingwen Tao, Hongbo Li, Xin Liang,"

Transcription

1 GreenLA: Green Linear Algebra Software for GPU-Accelerated Heterogeneous Computing Jieyang Chen, Li Tan, Panruo Wu, Dingwen Tao, Hongbo Li, Xin Liang, Sihuan Li, Rong Ge, Laxmi Bhuyan, and Zizhong Chen University of California, Riverside, Clemson University Ultrascale Systems Research Center, Los Alamos National Laboratory {jchen098, pwu011, dtao001, hli035, xlian007, sli049, bhuyan, Abstract While many linear algebra libraries have been developed to optimize their performance, no linear algebra library considers their energy efficiency at the library design time. In this paper, we present GreenLA - an energy efficient linear algebra software pacage that leverages linear algebra algorithmic characteristics to maximize energy savings with negligible overhead. GreenLA is (1) energy efficient: it saves up to several times more energy than the best existing energy saving approaches that do not modify library source codes; (2) high performance: its performance is comparable to the highly optimized linear algebra library MAGMA; and (3) transparent to applications: with the same programming interface, existing MAGMA users do not need to modify their source codes to benefit from GreenLA. Experimental results demonstrate that GreenLA is able to save up to three times more energy than the best existing energy saving approaches while delivering similar performance compared to the state-of-the-art linear algebra library MAGMA. Index Terms energy, performance, critical path, algorithmic slac prediction, DVFS, CPU, GPU, dense matrix factorizations I. INTRODUCTION Scientific applications running nonstop on large-scale High Performance Computing (HPC) systems must efficiently use energy for execution. Comprising millions of components, today s HPC systems already consume megawatts of power; to meet an insatiate demand for performance from missioncritical applications, future systems will consist of even more components and consume more power [3]. Efficient use of energy by scientific applications not only reduces energy costs but also allows greater performance under a given power budget and improves system reliability. Scientific applications can greatly benefit from heterogeneous computing technologies for energy efficiency. Heterogeneous computing systems are accelerated with manycore processing units. Such accelerators, including GPUs and coprocessors, consist of hundreds to thousands of lowpower cores, delivering highly power efficient computing. By offloading massively parallel and compute intensive ernels to Authors contributed equally; Corresponding author. SC16; Salt Lae City, Utah, USA; November /16/$31.00 c 2016 IEEE accelerators, scientific applications can achieve better performance while consuming less energy. Today, five of the top ten [6] computing systems are accelerated with either Intel Xeon Phi coprocessors or NVIDIA GPUs. Moreover, libraries and pacages commonly used by scientific applications begin to have heterogeneous computing versions, such as MAGMA [4] numerical linear algebra library. Improving the energy efficiency of commonly used libraries is an effective approach to energy efficient scientific computing. Unfortunately, existing libraries are focused on performance, inconsiderate of energy savings opportunities that do not adversely impact performance. For example, MAGMA decomposes a program to tass and schedules sequential and less parallelizable tass on CPU and larger more parallelizable ones on GPU. Consequently, MAGMA achieves better performance than its counterpart libraries for homogeneous CPU computing. Yet, inherent in the DAG-based tas scheduling in MAGMA, processing units scheduled with tass on noncritical paths unavoidably experience idle time, i.e., slac. The slac can be further exploited for energy savings by leveraging hardware power-aware techniques including Dynamic Voltage and Frequency Scaling (DVFS). DVFS has been used to save energy on CPU by scaling down CPU speed during underused execution phases [16] [30] [29] [31], and now is also available on memory [13] [14] and GPU cards [25] [5]. Numerical linear algebra libraries are used in a wide spectrum of high performance scientific applications. These libraries solve systems of linear equations, linear least square problems, and eigenvalue/eigenvector problems. Among numerical linear algebra operations, dense matrix factorizations can sometimes tae a large portion of execution time or even dominate the whole scientific application execution. This paper presents GreenLA - an energy efficient linear algebra software pacage for heterogeneous scientific computing on GPU-accelerated systems. At the initial stage of the project, we analyzed highly optimized dense matrix factorization algorithms including Cholesy, LU and QR factorizations. Then we developed GreenLA to exploit algorithmic nowledge of linear algebra operations to predict slac on CPU and GPU, and use application-level DVFS strategies to reclaim the slac for energy savings. Compared to OS level solutions that

2 rely on online learning and prediction for DVFS scheduling decisions, GreenLA accurately pinpoints and fully reclaims the slac, achieving more energy savings with less overhead. As a software pacage, GreenLA can wor in place of existing numerical linear algebra library MAGMA. Moreover, GreenLA can be freely enabled or disabled, less intrusive than the OS level solutions. The main contributions of this paper are: This paper develops GreenLA - an energy efficient linear algebra software pacage that effectively leverages the algorithmic characteristics of the linear algebra operations to maximize energy savings. GreenLA exploits linear algebra algorithmic nowledge combined with light-weight online profiling to accurately predict the length of tass and slacs, and hence can maximize the reclamation of slacs via algorithm-based DVFS scheduling. GreenLA saves up to three times more energy than the best existing energy saving approaches that do not modify the library source codes; GreenLA achieves comparable performance to the highly optimized linear algebra library MAGMA but needs less energy than MAGMA. GreenLA is transparent to applications. With the same programming interface as the existing library MAGMA, existing MAGMA users do not need to modify their source codes to benefit from GreenLA. The remainder of this paper is organized as follows. Section II introduces bacground nowledge. We present GreenLA design in Section III and its implementation in Section IV. Evaluation methodology and experimental results are provided in Sections V and VI respectively. Section VII discusses the related wor and Section VIII concludes the paper. II. DENSE MATRIX FACTORIZATIONS ON CPU-GPU HETEROGENEOUS SYSTEMS Dense matrix factorizations solve systems of linear equations, linear least square problems, and eigenvalue/eigenvector problems, etc. The commonly used algorithms include Cholesy, LU and QR factorizations. These algorithms decompose the coefficient matrix A in a linear system Ax = b into simpler forms, such as LL T and P LU. Consequently the solution x can be calculated using forward and bacward substitutions. Widely accepted heterogeneous computing libraries including MAGMA [4] provide routines of these matrix factorizations as standard functionality. Although each suitable for different problem classes, Cholesy, LU and QR factorizations share similar algorithmic characteristics. Figure 1 illustrates the main steps of highlyoptimized dense matrix factorizations on CPU-GPU heterogeneous systems in a global view. Factorizing a panel matrix is a Level 2 BLAS [1] operation and involves diagonal matrices factorization and panel factorization. Due to the low computational complexity and high sequentiality, panel factorization is performed on the CPU, shown as the green/yellow boxes. Updating a trailing matrix is a Level 3 BLAS operation with high computation complexity and high degrees of parallelism. It is performed on the GPU for performance efficiency, shown as the white boxes. Figure 1 also demonstrates how a dense matrix factorization proceeds on a CPU-GPU platform with data movement between CPU and GPU in a local view. As mentioned, factorizing the panel matrices is executed on the CPU; and updating the trailing matrix is massively parallelized on the GPU. The panel matrices calculated on the CPU are offloaded to the GPU and used by the GPU to update the trailing matrices. For the sae of performance, the next panel matrix that is updated on the GPU is immediately copied bac to the CPU before the entire trailing matrix finishes. As such, panel factorization is simultaneously executed on the CPU as the rest of trailing matrix is updated on the GPU. These processes proceed by a submatrix bloc, starting from the left upper corner of the global matrix and finishing when the whole global matrix is fully factorized. III. GREENLA ENERGY SAVING METHODOLOGY At the coarse level, the matrix factorization algorithms repeatedly assign two dependent types of tass to CPU and GPU respectively. In each iteration slac presents on both CPU and GPU. GreenLA reclaims the slac for energy savings with three main components. First, it identifies the critical path and slac on the non-critical paths on the CPU and GPU by analyzing the heterogeneous algorithms. Second, it uses algorithmic nowledge to predict and quantify the slac. Third, it exploits DVFS on the CPU and GPU to fully reclaim the slac on the non-critical paths for energy savings. The following subsections present detailed design of each components. A. Critical Path and Slac Analysis For tas-parallel applications, a slac refers to a time period when one computer component idly waits for another. Typical causes of slac include load imbalance, inter-tas or interprocess communication, and memory access stalls. Due to the pervasive presence in applications and systems, slac has been recognized as important energy saving opportunities in HPC. In a tas-parallel application, a Critical Path (CP) is a particular sequence of tass spanning from the beginning to the end of the execution where the total slac amounts to zero. While slac on the non-critical paths is usually exploited for energy savings, it is non-trivial to fully reclaim them without impacting application performance [32]. As shown in Figure 1, slac is present on the CPU and GPU in the heterogeneous matrix factorization algorithms. Specifically, the CPU must wait for the next updated panel from the GPU, and the GPU must wait for the factorized panel from the CPU. In addition, either the CPU waits for the GPU to finish updating the trailing matrix or the GPU waits for the CPU to finish factorizing the panel matrix. Moreover, the slac varies over iterations as the tas sizes change. Being able to accurately quantify and predict the slac is necessary before reclaiming them for optimal energy savings.

3 Fig. 1. Global View and Local View of Dense Matrix Factorizations on CPU-GPU Heterogeneous Systems. B. Online Algorithmic Slac Prediction For energy saving purposes, we must first now where and when the slac occurs and for how long they last. Given an application, one method to obtain such nowledge is to instrument the source code with timing functions and run the instrumented program with various problem sizes to collect the profiles. Alternatively, OS-level profiling can be performed with hardware performance counters on processors. Neither method is portable, and both methods require extensive profiling. In GreenLA, we investigate an online algorithmic slac prediction approach that accurately predicts the varying slac at runtime with minimum profiling. Given the heterogeneous matrix factorization algorithms, the slac on the CPU and GPU is mainly impacted by the software and hardware parameters. Problem size: the sizes of the panel matrix and trailing matrix scheduled on the CPU and GPU respectively based on the algorithmic characteristics; CPU compute capacity: the number of floating point operations the CPU are able to perform in one second for the assigned tass; GPU compute capacity: the number of floating point operations the GPU are able to performance in one second for the assigned tass. Data transfer speed: the number of bytes the GPU are able to transfer between CPU memory and GPU memory. Figure 2 plots the difference between CPU and GPU tas execution time for the first 100 iterations of LU factorization with various problem sizes and compute rates. For instance, and are two global matrix sizes, and high low means that the CPU runs at the highest speed and the GPU runs at the lowest speed. A non-zero difference between the execution time indicates that either CPU or GPU waits for the other in the current iteration. Appropriately adjusting the compute rate of CPU or GPU can narrow and even eliminate the time difference. In GreenLA, we leverage our prior nowledge about the factorization algorithms to quantitatively predict the slac be- Fig. 2. The difference between CPU and GPU execution time for the first 100 iterations of LU Factorization. The difference varies over iterations, and is a function of problem size and the CPU and GPU compute rates. tween CPU and GPU in each iteration. Two major factors that will affect the behavior of slac are the CPU/GPU execution time and the data copy time between CPU and GPU. We first focus on the execution time. We use LU factorization as an example here. It would be similar for Cholesy and QR factorization. Assuming the execution time of the first iteration of a N N with bloc size nb factorization are nown, we denote the CPU time for the N nb panel factorization as T0 CP U, and the GPU time for the N N trailing updating as T0 GP U where N = N nb. We can use the algorithmic information to predict the execution time of the remaining iterations. For now we consider fixed compute capacity for the CPU and GPU. As the time complexities of panel factorization and trailing updating are O(N 3 ) for a matrix size N, the execution time for iteration and + 1 have the following relation. T CP U +1 T CP U T GP U +1 T GP U = O(N3 +1 ) O(N 3 ) = O(N3 +1 ) O(N 3 ) (N ( + 1) nb) nb2 = (N nb) nb 2 nb = 1 (1) N nb = (N ( + 1) nb)2 nb (N nb) 2 nb ( ) 2 nb = 1 (2) N nb

4 Similarly, we can also predict the data copy time between CPU and GPU. Assuming the data copy time of the first iteration is T0 COP Y for transferring a panel to GPU from CPU and then transfer it bac. Assuming the data transfer speed is constant, we can use T0 COP Y and the size change of panel over iterations to predict future data copy time. In LU factorization, the space complexity of panel needed to be transferred is O(N 2 ), the data copy time for iteration and + 1 have the following relation: T COP Y +1 T COP Y = O(N2 +1 ) O(N 2 ) (N ( + 1)) nb2 = (N ) nb 2 = 1 1 N Since, CPU must wait for data copy is done before it can starts it computation, the slac during iteration can be quantified as follows: slac = T CP U + T COP Y T GP U N ; 0 < nb A positive value indicates that the GPU have slac time to wait for the CPU, while a negative value indicates that the CPU has slac to wait for the GPU. Value zero means that the CPU and GPU finish the current iterations at the same time. The initial slac is the time difference for the first iteration, i.e., = 0. slac 0 = T CP U 0 + T0 COP Y T0 GP U (3) (4) (5) From Equations 1-4, we can quantify the slac for the rest CPU-GPU tass, provided with the execution time of the first iteration. This algorithmic slac prediction is accurate and lightweight compared to OS level slac prediction. By using the prior algorithmic nowledge, only minimal profiling is necessary. C. CP-Aware Slac Reclamation With predicted slac over the iterations, we explore CPaware slac reclamation to save energy [22] [28] [32]. To adjust the time it taes for CPU or GPU, we can either adjust the computation time or data copy time. However, since it is usually hard to accurately adjust data transfer speed and it also brings much less energy saving benefit than computation, we focus on adjusting the execution time. Specifically, we employ properly reduced compute capability on the processing units on the non-critical path such that the total slac amount to zero. We exploit DVFS, an effective power-saving hardware technology available on the CPU and GPU for this purpose. DVFS-capable processing units have multiple performance/power states. Prior studies [17] [9] [7] show that running these processing units at lower states during slac significantly reduces power and energy without impacting overall application performance. As shown in Figure 2, with different inputs and power states, even within one run, slac can reside at either CPU side or GPU side. We apply accordingly either CPU DVFS or GPU DVFS depending on the source of slac. Fig. 3. Offline and Online Framewor of GreenLA. In order to quantitatively evaluate GreenLA, we theoretically analyze the upper bound of possible energy savings for heterogeneous matrix decomposition algorithms. Theoretically, the maximum energy savings is as follows: E sys = E new E old = (P new s (P old s + Pd new + P other ) T new + Pd old (6) + P other ) T old T new = T new + T DV F S (7) Here, P s and P d are the static power and dynamic power respectively consumed by the CPU and GPU, where P d is a function of processor frequency for DVFS-capable components. P other is the power consumption of other computer components. T new is the execution time of GreenLA including the overhead brings by the DVFS, while T old is the execution time of the original heterogeneous factorization. We denote P d = np total, P s + P other = (1 n)p total, where n is a ratio of dynamic CPU/GPU power P d within the total system power costs. By assuming T new = T old and adopting P d = f 2.4 from [15], we can simplify Equation 7 as: E sys = np total fnew 2.4 f old + (1 n)ptotal np total + (1 n)p total = 1 n 1 nb N nb ( min(t CP U, T GP U ) N max(t CP U, T GP U ) 1 ( ) = 1 n 1 fnew 2.4 f old ) 2.4 IV. GREENLA DESIGN AND IMPLEMENTATION We present the design and implementation details of GreenLA, including strict and relaxed slac reclamation and coupled GPU DVFS. Figure 3 illustrates the architecture and main components of GreenLA. It minimally profiles the application offline to obtain the execution time of the first CPU-GPU tass at different power states and first data copy time. Such data is used (8)

5 to derive the slac time during the first iteration. GreenLA uses online algorithmic slac prediction to accurately obtain the slac for the rest of the CPU-GPU tass. In cases where an available frequency is unable to eliminate a slac, we split the slac and use two consecutive available frequencies. tas f l f h f l f ideal T T x slac f lower f upper fset CP LastF req r TABLE I NOTATION IN ALGORITHMS AND FORMULATION. One tas of CPU-GPU dense matrix factorizations The lowest CPU/GPU core frequency set by DVFS The highest CPU/GPU core frequency set by DVFS The lowest GPU core frequency paired with the highest GPU memory frequency set by DVFS The optimal ideal frequency to eliminate slac Execution time of a tas running at f h Execution time of a tas running at f x Amount of time that a tas can be delayed by w/o increasing the total runtime of the application The neighboring frequency smaller than f ideal The neighboring frequency greater than f ideal The frequency set consisting of all used frequencies One tas trace consisting of tass to finish the application with the total slac of zero Frequency used after the last frequency scaling Ratio between two durations at split frequencies A. Strict Slac Reclamation With strict slac reclamation, we apply DVFS on the noncritical path in each iteration of the factorization algorithm. By properly lowering the frequency of the processing units to just eliminate the slac, we can reduce power consumption without impacting performance for the current iteration. Prior studies have shown that the execution time of compute-intensive worloads is proportional to the frequency of processing units. Based on this observation, we derive the ideal target frequency for the processing units on the noncritical path for given current and targeting execution time. In GreenLA, we tae into account the available discrete frequencies provided by CPU/GPU DVFS. In cases that the ideal target frequency is not equivalent to an available frequency, we use a weighted sum of two available neighboring frequencies and run the processing units at each frequency for a ratio of duration. Table I lists the notation used in the algorithms and formulation henceforth. Algorithm 1 details the selection of the CPU or GPU frequency if slac occurs and Algorithm 2 presents the frequency approximation [35] [29] with CP-aware energy efficient DVFS scheduling. For simplicity and readability, we use five helping functions in these two algorithms: SetFreq(), GetApprox- Ratio(), GetSlac(), GetCurFreq(), and GetOptFreq(). Of these, SetFreq() is a wrapper of CPU/GPU DVFS APIs that set specific CPU/GPU frequencies and GetCurFreq() is used to inquire the current frequency in use. The other three functions are more complex and will be detailed next. These three functions use prior nowledge of the mapping between frequency and execution time for the CPU and GPU tass. Specifically, GreenLA records runtime of the first CPU-GPU tass at different frequencies by instrumenting Algorithm 1 CPU/GPU DVFS Scheduling CPU GPU DVFS(CP, f Set, tas, ) 1: if (tas CP ) then 2: SetFreq(f h ) 3: else 4: slac GetSlac(tas, ) 5: if (slac > 0) then 6: f ideal GetOptFreq(tas, slac) 7: LastF req GetCurFreq() 8: fset fset f ideal LastF req 9: if (T CP U < T GP U ) then /* CPU DVFS */ 10: Call CP SSR(slac, f Set) or CP RSR(slac, fset) 11: else if (T CP U > T GP U ) then /* GPU DVFS */ 12: Call GPU DVFS(slac, f Set) 13: end if Algorithm 2 Strict CP-aware Slac Reclamation CP SSR(slac, f Set) 1: if (f l f ideal f h ) then 2: if (f ideal / fset) then 3: r GetApproxRatio(T lower, T upper, slac) 4: SetFreq(f lower, f upper, r) 5: else SetFreq(f ideal ) 6: else if (f ideal < f l ) then 7: SetFreq(f l ) 8: end if timestamps in the source codes, and use them to estimate the runtime and slac for the rest of the CPU-GPU tass using Equations 1-4. Given T, T lower, T upper, and slac of each pair of CPU-GPU tass, we split frequency with ratio r and the ideal frequency f ideal can be solved as follows: T + slac = T lower r + T upper (1 r) f ideal (T + slac) = f h T where T f h, T lower f lower, T upper f upper subject to f l f lower < f ideal < f upper f h The resulting target frequency f ideal is compared against the available frequencies (i.e., line 2 in Algorithm 2). If it matches an available frequency, the matched available frequency can be used directly. Otherwise, neighboring frequencies f lower and f upper are assigned in accordance with the ratios r and 1 r individually. In case that f ideal is lower than the lowest available frequency, the lowest available frequency is adopted, as setched in Algorithm 2. B. Relaxed Slac Reclamation In contrast to strict slac reclamation that applies DVFS at each iteration of the algorithms, relaxed slac reclamation forms multiple iterations into groups and apply DVFS at the group level. Relaxed slac reclamation offers two advantages. First, it reduces the time and energy overhead incurred by (9)

6 Algorithm 3 Relaxed CP-aware Slac Reclamation CP RSR(slac, f Set) 1: if (f l f ideal f h ) then 2: if (f ideal / fset) then 3: r GetApproxRatio(T lower, T upper, slac) 4: if (r < RlxF ctr) then 5: if (LastF req f upper ) then 6: SetFreq(f upper ) 7: LastF req = f upper 8: else Do Nothing 9: else SetFreq(f lower, f upper, r) 10: else SetFreq(f ideal ) 11: else if (f ideal < f l ) then 12: SetFreq(f l ) 13: end if frequent DVFS scheduling [31]. Second, it reclaims extra slac on the CPU and GPU caused by data dependencies, in addition to the slac caused by worload imbalance that strict reclamation targets. In each iteration, the GPU waits for the factorized panel from the CPU, and the CPU waits for the updated panel from the GPU. The slac is reclaimed by relaxed slac reclamation but not by strict slac reclamation. We use a relaxation factor (RlxF ctr in Algorithm 3) to determine the number of iterations in a group for scheduling decisions. As shown in Algorithm 3, if the calculated split frequency ratio r is less than RlxF ctr (e.g., 0.05), the duration at f lower is negligible according to Equation 9. In this case, we run the processing units at f upper. The selection of RlxF ctr is based on algorithmic characteristics. Slac varies with iterations and the variation rate provides us the criteria of choosing an appropriate RlxF ctr for the optimized energy efficiency. Note that even with a constant RlxF ctr, the number of iterations in a group may vary over time during a run. C. Coupled GPU Core and Memory DVFS DVFS can be applied to power-scalable hardware components, including CPU, GPU, and memory. It is noteworthy that on today s architectures such as GPU, core and memory frequencies are coupled and have to be switched simultaneously as a combination [5], which differs from CPU DVFS where only core frequency is scaled. Table II lists memorycore frequency pairs for two NVIDIA GPU. TABLE II GPU MEM.-CORE FREQ. PAIRS (UNIT: MHZ). NVIDIA Kepler Tesla K20c NVIDIA Kepler Tesla K40c Memory Freq. Core Freq. Memory Freq. Core Freq Algorithm 4 GPU Core/Memory DVFS Scheduling GPU DVFS(slac, f Set) 1: if (f l f ideal f h ) then 2: Call CP SSR(slac, f Set) or CP RSR(slac, fset) 3: else if (f l f ideal < f l ) then 4: 5: r GetApproxRatio(T l, T l, SetFreq(f l, f l, r) 6: else if (f ideal < f l ) then 7: SetFreq(f l ) 8: end if As discussed earlier, in our scenario, slac may occur either on the CPU side or on the GPU side. Algorithm 1 strategically maes CPU/GPU DVFS decisions depending on the source of slac. In particular, for the case of eliminating slac from GPU side, due to the coupled core and memory frequencies of GPU, a combined GPU core/memory DVFS scheduling strategy is necessary. Line 3-7 in Algorithm 4 details the combined strategy, where split frequency ratio r is calculated similarly as Equation 9. Equation 10 shows the calculation, and the difference is that scaling down to f l pairs with memory frequency reduction to the lowest. T + slac = T l r + T l (1 r) f ideal (T + slac) = f h T where T f h, T l f l, T l f l subject to f l f ideal < f l (10) In our experiments, GPU tass that update the trailing matrices involve considerable computation and memory accesses. Therefore simultaneously decreasing GPU core/memory frequencies has dual performance impact on computation and memory accesses. Our approach taes care of these scenarios since the dual slowdown has been recorded in T, which is the runtime of a tas at the lowest core and memory frequencies. V. EVALUATION In this section we detail the evaluation of GreenLA on a GPU-accelerated heterogeneous system: a linear algebra library of dense matrix factorizations (Cholesy, LU and QR) with an energy efficient CPU/GPU DVFS co-scheduling approach via online algorithmic slac prediction. A. Evaluation Methodology For comparison purposes, we present a state-of-the-art OS level method for slac prediction, and another type of classic DVFS scheduling strategy for saving energy. We stress the difference in other approaches against ours, and argue that our solution can outperform in both the accuracy of slac prediction and the amount of energy savings in our scenario. OS Level Slac Prediction. As another important slac prediction method, OS level slac prediction either wor for a specific type of applications sharing similar features, e.g., with stable/slowly-varying execution characteristics, or require considerable training to obtain accurate prediction results.

7 Online prediction mechanism presented in [24] [30] [29] is based on a simple assumption that tas behavior is identical every time a tas is executed. It is however defective for applications with variable worloads, such as matrix factorizations, where the remaining unfinished matrices become smaller as the factorizations proceed. Execution time shrins and slac varies as the worloads become lighter, which invalidates the above prediction mechanism. Regardless of the simplest prediction above, several enhanced history-based worload prediction algorithms have been proposed to handle the variation in HPC runs and produce more accurate prediction results [34] [12] [21] [18]. The RELAX algorithm employed in CPU MISER [18] exploits both prior predicted profiles and current runtime measured profiles: W i+1 = (1 λ)w i + λw i, where λ is a relaxation factor for adjusting the percentage of dependent information on the current measurement. This enhanced prediction can also be error-prone for dense matrix factorizations, since using a fixed λ cannot handle length variation of iterations of the core loop due to the shrining remaining unfinished matrices. The use of 2-D bloc cyclic data distribution further brings complexity to the prediction. Moreover, statistical predictive models have been adopted for accurate worload prediction, e.g., Hidden Marov Models (HMM) used in [36] and Predictive Bayesian Networ (PBN) used in [23]. Using offline training and learning based on historical records, results with high accuracy were achieved (average prediction error 3.3% via HMM and 0.43% via PBN). Although effective offline, online slac prediction for HPC applications using statistical models can be costly: Considerable amount of execution traces are required to train the statistical predictive models for accurate slac prediction. For instance, the training dataset in [36] was obtained by running applications on one server and evaluated on one different server, which can be impractical for HPC runs as discussed earlier. Race-to-halt Energy Saving. As the name suggests, race-tohalt (or race-to-idle) is an energy saving strategy that enforces power-scalable processors (e.g., CPU and GPU) to race when worloads are ready for processing, and to halt when no tass are present and the processors are idle/waiting. In other words, race refers to executing the worloads at the highest frequency and voltage of the processors for the pea performance until the finish of the worloads, and halt implies that processor frequency and voltage are switched to the lowest level from the end of the last executed worload to the start of the next worload. This straightforward solution can effectively save energy without incurring performance loss due to the following inferences: (a) The pea performance of processors is guaranteed during computation as in original runs; (b) the pea performance of processors is not needed when no tass are being executed and processors are waiting for data. As discuss earlier, in our scenario, slac can arise at either CPU side or GPU side, depending on various factors. In either case, the pea performance of the idle/waiting processors is not necessary. Per race-to-halt, we apply to them the lowest power state during the slac and switch bac to the highest one until the next worload is available. race-to-halt is CPfree such that no CP detection is required before any energy saving decisions are made. Thus it is generally lightweight and easy to implement. We implemented OS and library level approaches using race-to-halt and online HMM-enabled statistical slac prediction individually. Further, we implemented relaxed slac reclamation to compare with the default strict slac reclamation. Evaluated metrics include slac prediction accuracy, and energy and performance efficiency. For readability, we henceforth denote different test cases as follows: MAGMA: The original MAGMA runs of different-scale CPU-GPU Cholesy, LU and QR factorizations without any energy saving approaches; OS_r2h: The OS level implementation [2] based on a CPU race-to-halt worload prediction algorithm similar to the RELAX algorithm; lib_r2h: The library level implementation based on algorithmic race-to-halt on both CPU and GPU; OS_cpsr_str: The OS level implementation based on online HMM-enabled statistical slac prediction, with strict slac reclamation for each iteration; OS_cpsr_rlx: The OS level implementation based on online HMM-enabled statistical slac prediction, with relaxed slac reclamation (RlxF ctr = 0.05) for bloced iterations; lib_cpsr_str: The library level implementation based on online algorithmic slac prediction, with strict slac reclamation for each iteration; lib_cpsr_rlx: The library level implementation based on online algorithmic slac prediction, with relaxed slac reclamation (RlxF ctr = 0.05) for bloced iterations. Among all energy saving solutions, lib_cpsr_rlx empirically achieves the optimal energy efficiency with negligible performance loss, and thus we adopt it as our GreenLA in the comparison against MAGMA later. B. Experimental Setup We applied all above test scenarios to CPU-GPU Cholesy, LU and QR factorizations (MAGMA version 1.6.1) with multiple global matrix sizes each (ranging from 5120 to 20480). However, due to limit space, we only show the result for input size of For other input matrix sizes, the results are similar. All experiments were performed on a power-aware many-core CPU-GPU server. Table III lists hardware configuration of the experimental platform. The total system dynamic and static/leaage energy consumption of the above runs was measured using nvidia-smi tool [5] provided by NVIDIA, and following [26], we used PowerPac [19], an integrated software/hardware framewor for profiling and analysis of power/energy costs of HPC systems and applications. A separate meter node with PowerPac deployed was used to collect power/energy costs of all hardware components of the system, and the data was recorded in a log file and accessed after the above runs.

8 TABLE III HARDWARE CONFIGURATION FOR EXPERIMENTS. Component CPU GPU Processor 2 10-core Intel Xeon 2496 CUDA-core NVIDIA Ivy Bridge E Kepler GK110 Tesla K20c Pea Perf. 0.4 TFLOPS 1.17 TFLOPS Core&Mem. Core: ( by0.1)ghz Freq. Gear Mem.:Not DVFS-capable See left column of Table II Memory 64 GB RAM 5 GB RAM Cache 64 KB L1, 256 KB L2, 13 SMX units, 64 KB and 25.6 MB L3 48 KB read-only d-cache OS Fedora 21, 64-bit Linux ernel Pwr. Meter PowerPac nvidia-smi with -ac option VI. RESULTS Next we present experimental results of our evaluation via fine-grained comparison. We first demonstrate the performance and energy efficiency of our approach by comparing to the widely used numerical linear algebra MAGMA library. TABLE IV AVERAGE ERROR RATES OF SLACK PREDICTION FOR FOUR RUNS EACH OF CHOLESKY/LU/QR FACTORIZATION. OS Level Statistical Benchmars & Slac Prediction Library Level Algorit- Test Scenarios Base Iter. Base Iter. hmic Slac Prediction (First 10%) (First 20%) Cholesy ( ) 10.51% 6.62% 0.96% LU ( ) 9.95% 5.45% 0.16% QR ( ) 11.29% 5.77% 0.52% A. Average Error Rate of Slac Prediction We first showcase the accuracy of slac prediction of the OS level statistical approach with two training datasets and our library level algorithmic approach. Table IV summarizes the average error rate of slac prediction of the two approaches. As stated in section 4.3, HMM-based statistical slac prediction requires a group of base iterations (usually the first few iterations of a HPC run) to serve as an online training dataset. The prediction accuracy is highly associated with the size of the training dataset according to Table IV: The more base iterations are used, the more accuracy is achieved. However, the highest accuracy of the OS level approach is 5.45% for LU, while our library level approach can be as accurate as having a 0.16% error rate for LU. This low error rate of slac prediction can greatly facilitate forthcoming energy saving. For further comparison, we select the OS level statistical approach with higher slac prediction accuracy for more experiments. B. Total Energy Saving Comparison As shown in Figure 4, our library level CP-aware slac reclamation approaches could save more energy than current state-of-the-art approaches. Current approaches could only either save less energy or even costs extra energy. For example, OS level race-to-halt only slows down CPU when CPU utilization is below a threshold, while library level race-to-halt reduces both CPU and GPU speed when no corresponding worloads are running according to algorithmic characteristics. Due to high online probing overhead, the OS level race-to-halt approach incurs even more energy consumption, while the more lightweight library level race-to-halt approach can save minor energy savings (up to 3.6%) as shown in Figure 4. Other two approaches we compared are OS level approaches statistical slac prediction OS_cpsr_str and OS_cpsr_rlx. Since the two solutions produced inaccurate slac prediction, the inaccuracy results in inappropriate timing and duration of DVFS, which cannot eliminate possible slac saving less energy than the optimal, or incurs performance loss due to overdue or overdone DVFS consuming even more energy than the original run. As shown in Figure 4, those two approaches consumed more energy(1% 2%). Even if the OS level slac prediction can achieve the same accuracy as our library level approaches, the OS level solutions can waste much more energy saving opportunities than our library level approaches due to the considerable amount of iterations used for training, compared to only execution information of the first iteration needed by our library approach. On the other hand, our library level CP-aware slac reclamation approaches could save several times more energy. specifically, the energy saved from our approach is 2.5x of the energy saved using current best approach in Cholesy factorization, 1.5x in LU factorization and 3x in QR factorization. Moreover, different than our strict slac reclamation, which tries to reclaim all slacs using DVFS, our relaxed slac reclamation only tries to apply DVFS when split frequency ratio is larger than the RlxF ctr, which eliminated some unnecessary power state adjustments. The reduced number of power adjustment brings less DVFS performance overhead, which further saves more energy for the overall application. Note that, the Cholesy, LU and QR factorizations are very compute intensive. Based on the energy efficiency model in [11], their heterogeneous implementations in MAGMA have really high computation efficiency, which mae them hard to save more energy. Although there are many hardware components involve the execution process, we only focus on reducing the energy consumption of CPU and GPU. C. CPU Energy Saving Comparison Now, we focus on the energy saving on the CPU side. Note that since we only focus on reducing the energy cost of CPU, the energy measurement here does not include RAM energy consumption. We can see from Figure 5 the single core CPU energy comparison of Cholesy, LU and QR factorization using different energy saving solutions. As mentioned before, OS level race-to-halt only adjust the performance/power state of the CPU, which also introduce more probing overhead, it cost more energy on the CPU side (7% 8%). As for the OS level online HMM-enabled statistical slac prediction approaches, they need first 10% 20% iterations to do online training on the CPU, which not only brings more overhead, but also wastes valuable slac reclamation(energy saving) opportunities. However, thans to the high accurate algorithmic slac prediction, our library level CP-aware slac reclamation

9 Total Energy Saving Comaprison GPU Energy Saving Comparison 10.00% 20.00% 8.00% 6.00% 15.00% 4.00% 10.00% 2.00% 0.00% -2.00% -4.00% Cholesy LU QR 5.00% 0.00% Cholesy LU QR -6.00% -8.00% -5.00% % % OS_r2h OS_cpsr_str OS_cpsr_rlx lib_r2h lib_cpsr_str lib_cpsr_rlx OS_r2h OS_cpsr_str OS_cpsr_rlx lib_r2h lib_cpsr_str lib_cpsr_rlx Fig. 4. The total amount of energy saved in percentage using several different energy saving solutions. Right two bars in each group show the result of our approach, which can save up to 3x more energy than the current best solution % CPU Energy Saving Comparison Fig. 6. The amount of energy saved in percentage on GPU using several different energy saving solutions. Positive values indicate energy saving. Negative values indicate extra energy cost. Our approaches(right two bars in each group) shows more energy saving on GPU than existing state-of-the-art solutions. Sec 25 Execution Time Comparison 15.00% % 5.00% % Cholesy LU QR % % % OS_r2h OS_cpsr_str OS_cpsr_rlx lib_r2h lib_cpsr_str lib_cpsr_rlx 0 Cholesy LU QR Original OS_cpsr_str OS_cpsr_rlx OS_r2h lib_r2h lib_cpsr_str lib_cpsr_rlx Fig. 5. The amount of energy saved in percentage on single core CPU using several different energy saving solutions. Positive values indicate energy saving. Negative values indicate extra energy cost. Our approaches(right two bars in each group) shows more energy saving on CPU than existing stateof-the-art solutions. approaches could save more energy on the CPU side when the slac resides on CPU. D. GPU Energy Saving Comparison Next, we focus on the energy saving on the GPU side. As we can see from Figure 6, even GPU is assigned more computation tass, it usually finish its tass faster than the CPU, so slacs are more liely to occur on the GPU side, and thus it can save more energy. For library level race-tohalt approach, since it rely on algorithmic execution time Fig. 7. Execution time of Cholesy, LU and QR factorization using several different energy saving solutions. Right two bars in each group show our results, which have similar performance than existing state-of-the-art solutions. prediction, it can save energy to some degree. But it still waste some energy in halt state, so it saves less energy than our approach. As for OS level race-to-halt, it does not adjust the performance/power of the GPU at all and poor CPU side adjustment brings more performance overhead, so the overall execution time is prolonged, which results higher GPU energy consumption. Similar as on the CPU, OS level online prediction approach suffers from inaccurate slac prediction, which leads to higher GPU energy consumption. Our approaches, on the other hand, could save up to 16% energy on the GPU.

10 E. Time Overhead All inds of performance loss is observed from the experiments as shown in Figure 7. Typical performance degrading factors for OS level solutions consist of dynamic monitoring overhead (OS_r2h), online training overhead (OS_cpsr_str and OS_cpsr_rlx), DVFS overhead (all solutions), and performance loss from overdue or overdone DVFS due to inaccurate slac prediction (OS_cpsr_str and OS_cpsr_rlx). On the other hands, observed performance loss for library level solutions is from frequency approximation errors and DVFS overhead. Among all approaches, OS_cpsr_str has the highest performance loss (up to 14.4%, due to overdone DVFS from inaccurate slac prediction), while our lib_cpsr_str/lib_cpsr_rlx incurs minor performance loss (as low as 1.2%). VII. RELATED WORK The growing prevalence of heterogeneous architectures has motivated a large body of energy efficient approaches[27], but few of them were designed specifically for numerical linear algebra operations, such as dense matrix factorizations extensively used in HPC. Some efforts presented next can also be applied to heterogeneous systems with similar techniques. OS-LEVEL SCHEDULING. Liu et al. [25] proposed several power-aware techniques for a CPU-GPU heterogeneous system including two static/dynamic mapping algorithms and one aggressive voltage reduction scheme. Decent power and energy savings were achieved (more than 20%) towards several matrix worloads, but our wor targets different matrix algorithms with less slac and thus less energy saving opportunities. Their wor focused on time-sensitive applications that require significant computational capacity, such as real-time scoring of ban transactions, live video processing, etc. Our wor can also meet fine-grained timing requirements of applications, which is guaranteed by respecting tass on the critical path. Hong et al. [20] proposed an integrated power and performance model to statically determine the optimal number of processors for a given application running on GPU, based on the intuition that using more cores is not necessary for applications reaching the pea memory bandwidth. By using fewer GPU cores, average 11% energy savings can be achieved for memory bandwidth limited applications. The proposed system can be used by a thread scheduler for online energy saving decision-maing. This approach was also evaluated on different applications than us the more energy savings do not demonstrate their strength. LIBRARY-LEVEL SCHEDULING. Alonso et al. [8] incorporated two energy saving techniques at library level to schedule the computation of dense linear algebra operations on a hybrid platform of a multicore CPU and multiple GPU. Specifically, idle threads were bloced when no tass to process, and busy-waiting threads were also bloced by synchronization primitives when waiting for a device to finish its wor. Due to lac of consideration of algorithmic characteristics of dense linear algebra operations, the reported average energy cost reduction was around 4% for Cholesy and 7% for LU. Anzt et al. [10] applied energy efficient techniques on GPUaccelerated iterative linear solvers for memory-intensive sparse linear systems, and demonstrated that considerable energy savings (17.8% on average) can be fulfilled without harming performance noticeably, by setting CPU to a low power state during the time when GPU is running while CPU is busywaiting. However, the proposed solution cannot wor for our scenario where CPU and GPU frequently interact with data movement. Note that there exists more slac for sparse linear algebra operations and more energy savings are expected compared to dense ones. ONLINE AND OFFLINE WORKLOAD PREDICTION. There exist numerous solutions that predict worload and slac, facilitating energy saving decision-maing, spanning from online to offline. Zhu et al. [36] proposed a power-aware consolidation scheme of scientific worflow tass for energy and resource cost optimization. The pscimapper framewor consists of online consolidation and offline analysis for resource usage prediction (e.g., CPU utilization) using Hidden Marov Model (HMM), with reported average prediction error of moderate 3.3%. However, the drawbac of this approach is considerable slowdown around 15%, which is unacceptable in HPC nowadays. For comparison purposes, we also adopt HMM-based slac prediction in an online fashion instead, and experimental results indicate a higher prediction error (up to 11.29%) can be incurred. Li et al. [23] applied a Predictive Bayesian Networ to identify daily worload patterns and adjust resource provisioning accordingly for cloud datacenters. The prediction algorithm was evaluated to be considerably effective (only 0.43% average prediction error was observed). Our wor differs from this offline worload prediction GreenLA is able to achieve energy savings online for HPC runs using negligible amount of training dataset from the earlier stage of the runs. Tse et al. [33] proposed a novel Monte Carlo simulation framewor that supports multiple types of hardware accelerators (FPGA and GPU) and provided scheduling interfaces to adaptively perform load balancing at runtime for performance and energy efficiency. The energy savings achieved is however from performance gain obtained from the collaborative simulation framewor, not from an energy efficient strategy. VIII. CONCLUSIONS AND FUTURE WORK Energy efficiency is becoming a critical factor of concern when achieving parallelism in high performance scientific computing in this era. The growing prevalence of heterogeneous architectures nowadays brings more concerns on saving energy for the emerging systems. Essentially fulfilling energy efficiency requires accurate slac prediction with minor performance degradation. Existing energy efficient approaches span from OS level to application level, which can either be

11 inaccurate or cost-inefficient due to variable execution patterns of the target applications and lengthy training of the employed prediction model. In this paper, we propose a lightweight energy efficient approach for widely used numerical linear algebra software that utilizes algorithmic characteristics to obtain accurate slac prediction and thus gain the optimal energy savings. Experimental results on a many-core CPU- GPU platform demonstrate that our library level solution can achieve up to 8.5% energy saving than original implementation with negligible performance loss (as low as 1.2%), which 3x more energy savings compared to classic race-to-halt and worload prediction approaches. Although the currently achieved energy savings are moderate, provided a limited amount of slac for the target applications, more energy can be saved by reducing the minor performance loss incurred by our approach. It is practical and worthwhile since careful and fine-grained DVFS analysis is able to further decrease the number of DVFS switches and errors of frequency approximation. Possible energy savings can also be obtained from improved application characteristics that facilitate power reduction, such as CPU worload centralization and idle/unused core isolation, etc. We are also interested in investigating the energy impact of matrix factorization bloc sizes. It is possible that the optimal bloc size for performance differs from the optimal bloc size for energy costs. There may exist a trade-off between them. We further plan to extend the wor to more scientific applications on other emerging hardware and architectures in the near future. ACKNOWLEDGMENT The authors would lie to than NVIDIA for providing GPU devices used for experiments. This wor is partially supported by the NSF grants CCF , ACI , CCF , CCF , the SZSTI basic research program JCYJ , and the Special Program for Applied Research on Super Computation of the NSFC Guangdong Joint Fund (the second phase). REFERENCES [1] BLAS (Basic Linear Algebra Subprograms). [2] CPUSpeed. [3] Green500 Supercomputer Lists. [4] MAGMA (Matrix Algebra on GPU and Multicore Architectures). [5] NVIDIA System Management Interface (nvidia-smi). [6] TOP500 Supercomputer Lists. [7] P. Alonso, M. F. Dolz, F. D. Igual, R. Mayo, and E. S. Quintana-Ortí. DVFS-control techniques for dense linear algebra operations on multicore processors. CSRD, 27(4): , Nov [8] P. Alonso, M. F. Dolz, F. D. Igual, R. Mayo, and E. S. Quintana-Ortí. Reducing energy consumption of dense linear algebra operations on hybrid CPU-GPU platforms. In Proc. ISPA, pages 56 62, [9] P. Alonso, M. F. Dolz, R. Mayo, and E. S. Quintana-Ortí. Improving power efficiency of dense linear algebra algorithms on multi-core processors via slac control. In Proc. HPCS, pages , [10] H. Anzt, V. Heuveline, J. Aliaga, M. Castillo, J. C. Fernández, R. Mayo, and E. S. Quintana-Ortí. Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms. In Proc. IGCC, pages 1 6, [11] J. Choi, D. Bedard, R. J. Fowler, and R. W. Vuduc. A roofline model of energy. In IPDPS, pages IEEE Computer Society, [12] K. Choi, R. Soma, and M. Pedram. Fine-grained dynamic voltage and frequency scaling for precise energy and performance trade-off based on the ratio of off-chip access to on-chip computation times. In Proc. DATE, pages 18 28, [13] H. David, C. Fallin, E. Gorbatov, U. R. Hanebutte, and O. Mutlu. Memory power mamagement via dynamic voltage/frequency scaling. In Proc. ICAC, pages 31 40, [14] Q. Deng, D. Meisner, L. Ramos, T. F. Wenisch, and R. Bianchini. Memscale: Active low-power modes for main memory. In Proc. ASPLOS, pages , [15] R. Efraim, R. Ginosar, C. Weiser, and A. Mendelson. Energy aware race to halt: A down to EARtH approach for platform energy management. IEEE Computer Architecture Letters, 13(1):25 28, Jan [16] V. W. Freeh and D. K. Lowenthal. Using multiple energy gears in MPI programs on a power-scalable cluster. In Proc. PPoPP, pages , [17] R. Ge, X. Feng, and K. W. Cameron. Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters. In Proc. SC, page 34, [18] R. Ge, X. Feng, W.-C. Feng, and K. W. Cameron. CPU MISER: A performance-directed, run-time system for power-aware clusters. In Proc. ICPP, page 18, [19] R. Ge, X. Feng, S. Song, H.-C. Chang, D. Li, and K. W. Cameron. PowerPac: Energy profiling and analysis of high-performance systems and applications. IEEE Transactions on Parallel and Distributed Systems, 21(5): , May [20] S. Hong and H. Kim. An integrated GPU power and performance model. In Proc. ISCA, pages , [21] C.-H. Hsu and W.-C. Feng. A power-aware run-time system for highperformance computing. In Proc. SC, page 1, [22] T. Ishihara and H. Yasuura. Voltage scheduling problem for dynamically variable voltage processors. In Proc. ISLPED, pages , [23] J. Li, K. Shuang, S. Su, Q. Huang, P. Xu, X. Cheng, and J. Wang. Reducing operational costs through consolidation with resource prediction in the cloud. In Proc. CCGrid, pages , [24] M. Y. Lim, V. W. Freeh, and D. K. Lowenthal. Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs. In Proc. SC, page 107, [25] C. Liu, J. Li, W. Huang, J. Rubio, E. Speight, and X. Lin. Power-efficient time-sensitive mapping in heterogeneous systems. In Proc. PACT, pages 23 32, [26] H. Ltaief, P. Luszcze, and J. Dongarra. Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency. Computer Science - R&D, 27(4): , [27] S. Mittal and J. S. Vetter. A survey of methods for analyzing and improving GPU energy efficiency. CoRR, abs/ , [28] N. B. Rizvandi, J. Taheri, and A. Y. Zomaya. Some observations on optimal frequency selection in DVFS-based energy consumption minimization. Journal of Parallel Distributed Computing, 71(8): , Aug [29] B. Rountree, D. K. Lowenthal, B. R. de Supinsi, M. Schulz, V. W. Freeh, and T. Bletsch. Adagio: Maing DVS practical for complex HPC applications. In Proc. ICS, pages , [30] B. Rountree, D. K. Lowenthal, S. Fun, V. W. Freeh, B. R. de Supinsi, and M. Schulz. Bounding energy consumption in large-scale MPI programs. In Proc. SC, pages 1 9, [31] L. Tan, L. Chen, Z. Chen, Z. Zong, R. Ge, and D. Li. HP- DAEMON: High performance distributed adaptive energy-efficient matrix-multiplication. In Proc. ICCS, pages , [32] L. Tan and Z. Chen. Slow down or halt: Saving the optimal energy for scalable HPC systems. In Proc. ICPE, [33] A. H. T. Tse, D. B. Thomas, K. H. Tsoi, and W. Lu. Dynamic scheduling Monte-Carlo framewor for multi-accelerator heterogeneous clusters. In Proc. FPT, pages , [34] A. Varma, B. Ganesh, M. Sen, S. R. Choudhury, L. Srinivasan, and J. Bruce. A control-theoretic approach to dynamic voltage scheduling. In Proc. CASES, pages , [35] A. S. Wu, H. Yu, S. Jin, K.-C. Lin, and G. Schiavone. An incremental genetic algorithm approach to multiprocessor scheduling. IEEE Transactions on Parallel and Distributed Systems, 15(9): , Sept [36] Q. Zhu, J. Zhu, and G. Agrawal. Power-aware consolidation of scientific worflows in virtualized environments. In Proc. SC, pages 1 12, 2010.

國家圖書館典藏電子全文

國家圖書館典藏電子全文 i ii Abstract The most important task in human resource management is to encourage and help employees to develop their potential so that they can fully contribute to the organization s goals. The main

More information

2/80 2

2/80 2 2/80 2 3/80 3 DSP2400 is a high performance Digital Signal Processor (DSP) designed and developed by author s laboratory. It is designed for multimedia and wireless application. To develop application

More information

IP TCP/IP PC OS µclinux MPEG4 Blackfin DSP MPEG4 IP UDP Winsock I/O DirectShow Filter DirectShow MPEG4 µclinux TCP/IP IP COM, DirectShow I

IP TCP/IP PC OS µclinux MPEG4 Blackfin DSP MPEG4 IP UDP Winsock I/O DirectShow Filter DirectShow MPEG4 µclinux TCP/IP IP COM, DirectShow I 2004 5 IP TCP/IP PC OS µclinux MPEG4 Blackfin DSP MPEG4 IP UDP Winsock I/O DirectShow Filter DirectShow MPEG4 µclinux TCP/IP IP COM, DirectShow I Abstract The techniques of digital video processing, transferring

More information

Microsoft PowerPoint - Aqua-Sim.pptx

Microsoft PowerPoint - Aqua-Sim.pptx Peng Xie, Zhong Zhou, Zheng Peng, Hai Yan, Tiansi Hu, Jun-Hong Cui, Zhijie Shi, Yunsi Fei, Shengli Zhou Underwater Sensor Network Lab 1 Outline Motivations System Overview Aqua-Sim Components Experimental

More information

10384 19020101152519 UDC Rayleigh Quasi-Rayleigh Method for computing eigenvalues of symmetric tensors 2 0 1 3 2 0 1 3 2 0 1 3 2013 , 1. 2. [4], [27].,. [6] E- ; [7], Z-. [15]. Ramara G. kolda [1, 2],

More information

Microsoft PowerPoint _代工實例-1

Microsoft PowerPoint _代工實例-1 4302 動態光散射儀 (Dynamic Light Scattering) 代工實例與結果解析 生醫暨非破壞性分析團隊 2016.10 updated Which Size to Measure? Diameter Many techniques make the useful and convenient assumption that every particle is a sphere. The

More information

Preface This guide is intended to standardize the use of the WeChat brand and ensure the brand's integrity and consistency. The guide applies to all d

Preface This guide is intended to standardize the use of the WeChat brand and ensure the brand's integrity and consistency. The guide applies to all d WeChat Search Visual Identity Guidelines WEDESIGN 2018. 04 Preface This guide is intended to standardize the use of the WeChat brand and ensure the brand's integrity and consistency. The guide applies

More information

PowerPoint Presentation

PowerPoint Presentation Decision analysis 量化決策分析方法專論 2011/5/26 1 Problem formulation- states of nature In the decision analysis, decision alternatives are referred to as chance events. The possible outcomes for a chance event

More information

WTO

WTO 10384 200015128 UDC Exploration on Design of CIB s Human Resources System in the New Stage (MBA) 2004 2004 2 3 2004 3 2 0 0 4 2 WTO Abstract Abstract With the rapid development of the high and new technique

More information

Knowledge and its Place in Nature by Hilary Kornblith

Knowledge and its Place in Nature by Hilary Kornblith Deduction by Daniel Bonevac Chapter 7 Quantified Natural Deduction Quantified Natural Deduction As with truth trees, natural deduction in Q depends on the addition of some new rules to handle the quantifiers.

More information

VASP应用运行优化

VASP应用运行优化 1 VASP wszhang@ustc.edu.cn April 8, 2018 Contents 1 2 2 2 3 2 4 2 4.1........................................................ 2 4.2..................................................... 3 5 4 5.1..........................................................

More information

BC04 Module_antenna__ doc

BC04 Module_antenna__ doc http://www.infobluetooth.com TEL:+86-23-68798999 Fax: +86-23-68889515 Page 1 of 10 http://www.infobluetooth.com TEL:+86-23-68798999 Fax: +86-23-68889515 Page 2 of 10 http://www.infobluetooth.com TEL:+86-23-68798999

More information

Microsoft PowerPoint - ATF2015.ppt [相容模式]

Microsoft PowerPoint - ATF2015.ppt [相容模式] Improving the Video Totalized Method of Stopwatch Calibration Samuel C.K. Ko, Aaron Y.K. Yan and Henry C.K. Ma The Government of Hong Kong Special Administrative Region (SCL) 31 Oct 2015 1 Contents Introduction

More information

hks298cover&back

hks298cover&back 2957 6364 2377 3300 2302 1087 www.scout.org.hk scoutcraft@scout.org.hk 2675 0011 5,500 Service and Scouting Recently, I had an opportunity to learn more about current state of service in Hong Kong

More information

untitled

untitled LBS Research and Application of Location Information Management Technology in LBS TP319 10290 UDC LBS Research and Application of Location Information Management Technology in LBS , LBS PDA LBS

More information

Microsoft PowerPoint - talk8.ppt

Microsoft PowerPoint - talk8.ppt Adaptive Playout Scheduling Using Time-scale Modification Yi Liang, Nikolaus Färber Bernd Girod, Balaji Prabhakar Outline QoS concerns and tradeoffs Jitter adaptation as a playout scheduling scheme Packet

More information

Microsoft PowerPoint - ryz_030708_pwo.ppt

Microsoft PowerPoint - ryz_030708_pwo.ppt Long Term Recovery of Seven PWO Crystals Ren-yuan Zhu California Institute of Technology CMS ECAL Week, CERN Introduction 20 endcap and 5 barrel PWO crystals went through (1) thermal annealing at 200 o

More information

國立中山大學學位論文典藏.PDF

國立中山大學學位論文典藏.PDF I II III The Study of Factors to the Failure or Success of Applying to Holding International Sport Games Abstract For years, holding international sport games has been Taiwan s goal and we are on the way

More information

XML SOAP DOM B2B B/S B2B B2B XML SOAP

XML SOAP DOM B2B B/S B2B B2B XML SOAP 10384 9831010 U D C B2B 2 0 0 1 4 2 0 0 1 5 2 0 0 1 2001 4 XML SOAP DOM B2B B/S B2B B2B XML SOAP ABSTRACT Based on the research of Supply Chain Management theory and E-Commerce theory, especially in Business

More information

untitled

untitled Co-integration and VECM Yi-Nung Yang CYCU, Taiwan May, 2012 不 列 1 Learning objectives Integrated variables Co-integration Vector Error correction model (VECM) Engle-Granger 2-step co-integration test Johansen

More information

Improved Preimage Attacks on AES-like Hash Functions: Applications to Whirlpool and Grøstl

Improved Preimage Attacks on AES-like Hash Functions: Applications to Whirlpool and Grøstl SKLOIS (Pseudo) Preimage Attack on Reduced-Round Grøstl Hash Function and Others Shuang Wu, Dengguo Feng, Wenling Wu, Jian Guo, Le Dong, Jian Zou March 20, 2012 Institute. of Software, Chinese Academy

More information

2015年4月11日雅思阅读预测机经(新东方版)

2015年4月11日雅思阅读预测机经(新东方版) 剑 桥 雅 思 10 第 一 时 间 解 析 阅 读 部 分 1 剑 桥 雅 思 10 整 体 内 容 统 计 2 剑 桥 雅 思 10 话 题 类 型 从 以 上 统 计 可 以 看 出, 雅 思 阅 读 的 考 试 话 题 一 直 广 泛 多 样 而 题 型 则 稳 中 有 变 以 剑 桥 10 的 test 4 为 例 出 现 的 三 篇 文 章 分 别 是 自 然 类, 心 理 研 究 类,

More information

UDC Empirical Researches on Pricing of Corporate Bonds with Macro Factors 厦门大学博硕士论文摘要库

UDC Empirical Researches on Pricing of Corporate Bonds with Macro Factors 厦门大学博硕士论文摘要库 10384 15620071151397 UDC Empirical Researches on Pricing of Corporate Bonds with Macro Factors 2010 4 Duffee 1999 AAA Vasicek RMSE RMSE Abstract In order to investigate whether adding macro factors

More information

Chapter 24 DC Battery Sizing

Chapter 24  DC Battery Sizing 26 (Battery Sizing & Discharge Analysis) - 1. 2. 3. ETAP PowerStation IEEE 485 26-1 ETAP PowerStation 4.7 IEEE 485 ETAP PowerStation 26-2 ETAP PowerStation 4.7 26.1 (Study Toolbar) / (Run Battery Sizing

More information

國立中山大學學位論文典藏

國立中山大學學位論文典藏 i Examinations have long been adopting for the selection of the public officials and become an essential tradition in our country. For centuries, the examination system, incorporated with fairness, has

More information

Microsoft Word - TIP006SCH Uni-edit Writing Tip - Presentperfecttenseandpasttenseinyourintroduction readytopublish

Microsoft Word - TIP006SCH Uni-edit Writing Tip - Presentperfecttenseandpasttenseinyourintroduction readytopublish 我 难 度 : 高 级 对 们 现 不 在 知 仍 道 有 听 影 过 响 多 少 那 次 么 : 研 英 究 过 文 论 去 写 文 时 作 的 表 技 引 示 巧 言 事 : 部 情 引 分 发 言 该 生 使 在 中 用 过 去, 而 现 在 完 成 时 仅 表 示 事 情 发 生 在 过 去, 并 的 哪 现 种 在 时 完 态 成 呢 时? 和 难 过 道 去 不 时 相 关? 是 所 有

More information

% % % % % % ~

% % % % % % ~ 1001-5558 2015 03-0021-16 2010 C91 A 2014 5 2010 N. W. Journal of Ethnology 2015 3 86 2015.No.3 Total No.86 2010 2010 2181.58 882.99 40.47% 1298.59 59.53% 2013 2232.78 847.29 37.95% 1385.49 62.05% 1990

More information

THE APPLICATION OF ISOTOPE RATIO ANALYSIS BY INDUCTIVELY COUPLED PLASMA MASS SPECTROMETER A Dissertation Presented By Chaoyong YANG Supervisor: Prof.D

THE APPLICATION OF ISOTOPE RATIO ANALYSIS BY INDUCTIVELY COUPLED PLASMA MASS SPECTROMETER A Dissertation Presented By Chaoyong YANG Supervisor: Prof.D 10384 070302 9825042 UDC 2001.6. 2001.7. 20016 THE APPLICATION OF ISOTOPE RATIO ANALYSIS BY INDUCTIVELY COUPLED PLASMA MASS SPECTROMETER A Dissertation Presented By Chaoyong YANG Supervisor: Prof.Dr. Xiaoru

More information

X UDC A Post-Evaluation Research on SINOPEC Refinery Reconstruction and Expanding Project MBA 厦门大学博硕士论文摘要库

X UDC A Post-Evaluation Research on SINOPEC Refinery Reconstruction and Expanding Project MBA 厦门大学博硕士论文摘要库 2003 2 10384 X9915078 UDC A Post-Evaluation Research on SINOPEC Refinery Reconstruction and Expanding Project MBA 2003 2 2003 3 2003 200 / Abstract Post-evaluation is a comprehensive evaluation on an implemented

More information

Microsoft PowerPoint SSBSE .ppt [Modo de Compatibilidade]

Microsoft PowerPoint SSBSE .ppt [Modo de Compatibilidade] SSBSE 2015, Bergamo Transformed Search Based Software Engineering: A New Paradigm of SBSE He JIANG, Zhilei Ren, Xiaochen Li, Xiaochen Lai jianghe@dlut.edu.cn School of Software, Dalian Univ. of Tech. Outline

More information

Microsoft PowerPoint - NCBA_Cattlemens_College_Darrh_B

Microsoft PowerPoint - NCBA_Cattlemens_College_Darrh_B Introduction to Genetics Darrh Bullock University of Kentucky The Model Trait = Genetics + Environment Genetics Additive Predictable effects that get passed from generation to generation Non-Additive Primarily

More information

A Study on Grading and Sequencing of Senses of Grade-A Polysemous Adjectives in A Syllabus of Graded Vocabulary for Chinese Proficiency 2002 I II Abstract ublished in 1992, A Syllabus of Graded Vocabulary

More information

Settlement Equation " H = CrH 1+ e o log p' o + ( p' p' c o! p' o ) CcH + 1+ e o log p' c + p' f! ( p' p' c c! p' o ) where ΔH = consolidation settlem

Settlement Equation  H = CrH 1+ e o log p' o + ( p' p' c o! p' o ) CcH + 1+ e o log p' c + p' f! ( p' p' c c! p' o ) where ΔH = consolidation settlem Prediction of Compression and Recompression Indices of Texas Overconsolidated Clays Presented By: Sayeed Javed, Ph.D., P.E. Settlement Equation " H = CrH 1+ e o log p' o + ( p' p' c o! p' o ) CcH + 1+

More information

ENGG1410-F Tutorial 6

ENGG1410-F Tutorial 6 Jianwen Zhao Department of Computer Science and Engineering The Chinese University of Hong Kong 1/16 Problem 1. Matrix Diagonalization Diagonalize the following matrix: A = [ ] 1 2 4 3 2/16 Solution The

More information

Microsoft PowerPoint - STU_EC_Ch08.ppt

Microsoft PowerPoint - STU_EC_Ch08.ppt 樹德科技大學資訊工程系 Chapter 8: Counters Shi-Huang Chen Fall 2010 1 Outline Asynchronous Counter Operation Synchronous Counter Operation Up/Down Synchronous Counters Design of Synchronous Counters Cascaded Counters

More information

我国原奶及乳制品安全生产和质量安全管理研究

我国原奶及乳制品安全生产和质量安全管理研究 密 级 论 文 编 号 中 国 农 业 科 学 院 硕 士 学 位 论 文 我 国 原 奶 及 乳 制 品 质 量 安 全 管 理 研 究 Study on Quality and Safety Management of Raw Milk and Dairy Products in China 申 请 人 : 段 成 立 指 导 教 师 : 叶 志 华 研 究 员 张 蕙 杰 研 究 员 申 请

More information

08陈会广

08陈会广 第 34 卷 第 10 期 2012 年 10 月 2012,34(10):1871-1880 Resources Science Vol.34,No.10 Oct.,2012 文 章 编 号 :1007-7588(2012)10-1871-10 房 地 产 市 场 及 其 细 分 的 调 控 重 点 区 域 划 分 理 论 与 实 证 以 中 国 35 个 大 中 城 市 为 例 陈 会 广 1,

More information

~ ~

~ ~ * 40 4 2016 7 Vol. 40 No. 4 July 2016 35 Population Research 2014 1 2016 2016 9101. 0 40 49. 6% 2017 ~ 2021 1719. 5 160 ~ 470 100872 Accumulated Couples and Extra Births under the Universal Tw o-child

More information

Microsoft PowerPoint - Performance Analysis of Video Streaming over LTE using.pptx

Microsoft PowerPoint - Performance Analysis of Video Streaming over LTE using.pptx ENSC 427 Communication Networks Spring 2016 Group #2 Project URL: http://www.sfu.ca/~rkieu/ensc427_project.html Amer, Zargham 301149920 Kieu, Ritchie 301149668 Xiao, Lei 301133381 1 Roadmap Introduction

More information

國立中山大學學位論文典藏.PDF

國立中山大學學位論文典藏.PDF 國 立 中 山 大 學 企 業 管 理 學 系 碩 士 論 文 以 系 統 動 力 學 建 構 美 食 餐 廳 異 國 麵 坊 之 管 理 飛 行 模 擬 器 研 究 生 : 簡 蓮 因 撰 指 導 教 授 : 楊 碩 英 博 士 中 華 民 國 九 十 七 年 七 月 致 謝 詞 寫 作 論 文 的 過 程 是 一 段 充 滿 艱 辛 與 淚 水 感 動 與 窩 心 的 歷 程, 感 謝 這 一

More information

國家圖書館典藏電子全文

國家圖書館典藏電子全文 A Study on the Job Stress and the Ways of Coping for the Director of Elementary School in the Middle Area of Taiwan Abstract This study aims at probing the subject current status as related to stress and

More information

JOURNAL OF EARTHQUAKE ENGINEERING AND ENGINEERING VIBRATION Vol. 31 No. 5 Oct /35 TU3521 P315.

JOURNAL OF EARTHQUAKE ENGINEERING AND ENGINEERING VIBRATION Vol. 31 No. 5 Oct /35 TU3521 P315. 31 5 2011 10 JOURNAL OF EARTHQUAKE ENGINEERING AND ENGINEERING VIBRATION Vol. 31 No. 5 Oct. 2011 1000-1301 2011 05-0075 - 09 510405 1 /35 TU3521 P315. 8 A Earthquake simulation shaking table test and analysis

More information

南華大學數位論文

南華大學數位論文 -- Managing Traditional Temples A Case Study of Representative Temples in CHIA-YI i Abstract This research used the methodology of field study historical comparative research, and qualitative interview

More information

Abstract There arouses a fever pursuing the position of being a civil servant in China recently and the phenomenon of thousands of people running to a

Abstract There arouses a fever pursuing the position of being a civil servant in China recently and the phenomenon of thousands of people running to a Abstract There arouses a fever pursuing the position of being a civil servant in China recently and the phenomenon of thousands of people running to attend the entrance examination of civil servant is

More information

ABSTRACT ABSTRACT Based on analyzing public corporation in foreign countries, this paper studies basic theories of public legal establishment, with our country s reality in the social transferring period

More information

國立高雄大學○○○○○○學系(研究所)(標楷體18號字

國立高雄大學○○○○○○學系(研究所)(標楷體18號字 國 立 高 雄 大 學 都 市 發 展 與 建 築 研 究 所 碩 士 論 文 高 雄 後 勁 地 區 傳 統 民 居 特 徵 之 研 究 The Study of The Characteristic of Traditional Residential Buildings of Houjing District in Kaohsiung 研 究 生 : 許 輝 隆 撰 指 導 教 授 : 陳 啟

More information

穨control.PDF

穨control.PDF TCP congestion control yhmiu Outline Congestion control algorithms Purpose of RFC2581 Purpose of RFC2582 TCP SS-DR 1998 TCP Extensions RFC1072 1988 SACK RFC2018 1996 FACK 1996 Rate-Halving 1997 OldTahoe

More information

Microsoft Word - 上傳電子檔.doc

Microsoft Word - 上傳電子檔.doc 創 新 設 計 研 究 所 碩 士 學 位 論 文 臺 北 聯 營 公 車 站 立 輔 助 設 備 最 適 設 置 高 度 研 究 In Search of the Appropriate Height for the Standing Support Equipments on the City Bus at Taipei 研 究 生 : 賴 振 永 指 導 教 授 : 黃 啟 梧 中 華 民 國

More information

<4D6963726F736F667420506F776572506F696E74202D20C8EDBCFEBCDCB9B9CAA6D1D0D0DEBDB2D7F92E707074>

<4D6963726F736F667420506F776572506F696E74202D20C8EDBCFEBCDCB9B9CAA6D1D0D0DEBDB2D7F92E707074> 软 件 架 构 师 研 修 讲 座 胡 协 刚 软 件 架 构 师 UML/RUP 专 家 szjinco@public.szptt.net.cn 中 国 软 件 架 构 师 网 东 软 培 训 中 心 小 故 事 : 七 人 分 粥 当 前 软 件 团 队 的 开 发 现 状 和 面 临 的 问 题 软 件 项 目 的 特 点 解 决 之 道 : 从 瀑 布 模 型 到 迭 代 模 型 解 决 项

More information

A study on the Fire Safety of Interior Finishes: Residential Buildings in Kaohsiung City taken as Examples between the interior design of safety and skill tests of Scholastic. A Thesis Submitted to Institute

More information

Abstract Due to the improving of living standards, people gradually seek lighting quality from capacityto quality. And color temperature is the important subject of it. According to the research from aboard,

More information

(baking powder) 1 ( ) ( ) 1 10g g (two level design, D-optimal) 32 1/2 fraction Two Level Fractional Factorial Design D-Optimal D

(baking powder) 1 ( ) ( ) 1 10g g (two level design, D-optimal) 32 1/2 fraction Two Level Fractional Factorial Design D-Optimal D ( ) 4 1 1 1 145 1 110 1 (baking powder) 1 ( ) ( ) 1 10g 1 1 2.5g 1 1 1 1 60 10 (two level design, D-optimal) 32 1/2 fraction Two Level Fractional Factorial Design D-Optimal Design 1. 60 120 2. 3. 40 10

More information

输电线路智能监测系统通信技术应用研究

输电线路智能监测系统通信技术应用研究 Smart Grid 智 能 电 网, 2014, 4, 11-15 http://dx.doi.org/10.12677/sg.2014.41003 Published Online February 2014 (http://www.hanspub.org/journal/sg.html) Application Research of Communication Technology for

More information

Microsoft PowerPoint - TTCN-Introduction-v5.ppt

Microsoft PowerPoint - TTCN-Introduction-v5.ppt Conformance Testing and TTCN 工研院無線通訊技術部林牧台 / Morton Lin 03-5912360 mtlin@itri.org.tw 1 Outline Introduction and Terminology Conformance Testing Process 3GPP conformance testing and test cases A real world

More information

Outline Speech Signals Processing Dual-Tone Multifrequency Signal Detection 云南大学滇池学院课程 : 数字信号处理 Applications of Digital Signal Processing 2

Outline Speech Signals Processing Dual-Tone Multifrequency Signal Detection 云南大学滇池学院课程 : 数字信号处理 Applications of Digital Signal Processing 2 CHAPTER 10 Applications of Digital Signal Processing Wang Weilian wlwang@ynu.edu.cn School of Information Science and Technology Yunnan University Outline Speech Signals Processing Dual-Tone Multifrequency

More information

Shanghai International Studies University THE STUDY AND PRACTICE OF SITUATIONAL LANGUAGE TEACHING OF ADVERB AT BEGINNING AND INTERMEDIATE LEVEL A Thes

Shanghai International Studies University THE STUDY AND PRACTICE OF SITUATIONAL LANGUAGE TEACHING OF ADVERB AT BEGINNING AND INTERMEDIATE LEVEL A Thes 上 海 外 国 语 大 学 硕 士 学 位 论 文 对 外 汉 语 初 中 级 副 词 情 境 教 学 研 究 与 实 践 院 系 : 国 际 文 化 交 流 学 院 学 科 专 业 : 汉 语 国 际 教 育 姓 名 : 顾 妍 指 导 教 师 : 缪 俊 2016 年 5 月 Shanghai International Studies University THE STUDY AND PRACTICE

More information

度 身 體 活 動 量 ; 芬 蘭 幼 兒 呈 現 中 度 身 體 活 動 量 之 比 例 高 於 臺 灣 幼 兒 (5) 幼 兒 在 投 入 度 方 面 亦 達 顯 著 差 異 (χ²=185.35, p <.001), 芬 蘭 與 臺 灣 幼 兒 多 半 表 現 出 中 度 投 入 與 高 度

度 身 體 活 動 量 ; 芬 蘭 幼 兒 呈 現 中 度 身 體 活 動 量 之 比 例 高 於 臺 灣 幼 兒 (5) 幼 兒 在 投 入 度 方 面 亦 達 顯 著 差 異 (χ²=185.35, p <.001), 芬 蘭 與 臺 灣 幼 兒 多 半 表 現 出 中 度 投 入 與 高 度 臺 灣 與 芬 蘭 幼 兒 園 室 內 自 由 遊 戲 內 涵 之 探 討 林 昭 溶 毛 萬 儀 經 國 管 理 暨 健 康 學 院 幼 兒 保 育 系 副 教 授 joyce@ems.cku.edu.tw 吳 敏 而 國 家 教 育 研 究 院 研 究 員 rozwu@mail.naer.edu.tw wanyi@ems.cku.edu.tw 摘 要 自 由 遊 戲 被 視 為 是 幼 兒 的

More information

<4D6963726F736F667420576F7264202D2032303130C4EAC0EDB9A4C0E04142BCB6D4C4B6C1C5D0B6CFC0FDCCE2BEABD1A15F325F2E646F63>

<4D6963726F736F667420576F7264202D2032303130C4EAC0EDB9A4C0E04142BCB6D4C4B6C1C5D0B6CFC0FDCCE2BEABD1A15F325F2E646F63> 2010 年 理 工 类 AB 级 阅 读 判 断 例 题 精 选 (2) Computer mouse How does the mouse work? We have to start at the bottom, so think upside down for now. It all starts with mouse ball. As the mouse ball in the bottom

More information

1505.indd

1505.indd 上 海 市 孙 中 山 宋 庆 龄 文 物 管 理 委 员 会 上 海 宋 庆 龄 研 究 会 主 办 2015.05 总 第 148 期 图 片 新 闻 2015 年 9 月 22 日, 由 上 海 孙 中 山 故 居 纪 念 馆 台 湾 辅 仁 大 学 和 台 湾 图 书 馆 联 合 举 办 的 世 纪 姻 缘 纪 念 孙 中 山 先 生 逝 世 九 十 周 年 及 其 革 命 历 程 特 展

More information

2 2 3 DLight CPU I/O DLight Oracle Solaris (DTrace) C/C++ Solaris DLight DTrace DLight DLight DLight C C++ Fortran CPU I/O DLight AM

2 2 3 DLight CPU I/O DLight Oracle Solaris (DTrace) C/C++ Solaris DLight DTrace DLight DLight DLight C C++ Fortran CPU I/O DLight AM Oracle Solaris Studio 12.2 DLight 2010 9 2 2 3 DLight 3 3 6 13 CPU 16 18 21 I/O DLight Oracle Solaris (DTrace) C/C++ Solaris DLight DTrace DLight DLight DLight C C++ Fortran CPU I/O DLight AMP Apache MySQL

More information

Microsoft PowerPoint - CH 04 Techniques of Circuit Analysis

Microsoft PowerPoint - CH 04 Techniques of Circuit Analysis Chap. 4 Techniques of Circuit Analysis Contents 4.1 Terminology 4.2 Introduction to the Node-Voltage Method 4.3 The Node-Voltage Method and Dependent Sources 4.4 The Node-Voltage Method: Some Special Cases

More information

10384 X2009230010 UDC The Design and Implementation of Small and Medium-sized Courier Company Logistics Vehicle Scheduling System 2012 06 Abstract With the arrival of the information age, tremendous

More information

Microsoft Word - 01李惠玲ok.doc

Microsoft Word - 01李惠玲ok.doc 康 寧 學 報 11:1-20(2009) 1 數 位 學 習 於 護 理 技 術 課 程 之 運 用 與 評 值 * 李 惠 玲 ** 高 清 華 *** 呂 莉 婷 摘 要 背 景 : 網 路 科 技 在 教 育 的 使 用 已 成 為 一 種 有 利 的 教 學 輔 助 工 具 網 路 教 學 的 特 性, 在 使 學 習 可 不 分 時 間 與 空 間 不 同 進 度 把 握 即 時 性 資

More information

Microsoft Word - 刘 慧 板.doc

Microsoft Word - 刘  慧 板.doc 中 国 环 境 科 学 2012,32(5):933~941 China Environmental Science 系 统 动 力 学 在 空 港 区 域 规 划 环 境 影 响 评 价 中 的 应 用 刘 慧 1,2, 郭 怀 成 1*, 盛 虎 1, 都 小 尚 1,3, 李 娜 1 1, 杨 永 辉 (1. 北 京 大 学 环 境 科 学 与 工 程 学 院, 北 京 100871; 2.

More information

% % 99% Sautman B. Preferential Policies for Ethnic Minorities in China The Case

% % 99% Sautman B. Preferential Policies for Ethnic Minorities in China The Case 1001-5558 2015 03-0037-11 2000 2010 C95 DOI:10.16486/j.cnki.62-1035/d.2015.03.005 A 1 2014 14CRK014 2013 13SHC012 1 47 2181 N. W. Journal of Ethnology 2015 3 86 2015.No.3 Total No.86 20 70 122000 2007

More information

致 谢 开 始 这 篇 致 谢 的 时 候, 以 为 这 是 最 轻 松 最 愉 快 的 部 分, 而 此 时 心 头 却 充 满 了 沉 甸 甸 的 回 忆 和 感 恩, 一 时 间 竟 无 从 下 笔 虽 然 这 远 不 是 一 篇 完 美 的 论 文, 但 完 成 这 篇 论 文 要 感 谢

致 谢 开 始 这 篇 致 谢 的 时 候, 以 为 这 是 最 轻 松 最 愉 快 的 部 分, 而 此 时 心 头 却 充 满 了 沉 甸 甸 的 回 忆 和 感 恩, 一 时 间 竟 无 从 下 笔 虽 然 这 远 不 是 一 篇 完 美 的 论 文, 但 完 成 这 篇 论 文 要 感 谢 中 国 科 学 技 术 大 学 博 士 学 位 论 文 论 文 课 题 : 一 个 新 型 简 易 电 子 直 线 加 速 器 的 关 键 技 术 研 究 学 生 姓 名 : 导 师 姓 名 : 单 位 名 称 : 专 业 名 称 : 研 究 方 向 : 完 成 时 间 : 谢 家 麟 院 士 王 相 綦 教 授 国 家 同 步 辐 射 实 验 室 核 技 术 及 应 用 加 速 器 物 理 2006

More information

國立中山大學學位論文典藏.PDF

國立中山大學學位論文典藏.PDF The Study on the New Pension Scheme for Civil Servants Evidence from Kaohsiung County I II 1. III Thesis Abstract Title of Thesis The Study on the New Pension Scheme for Civil Servants: Evidence from Kaohsiung

More information

苗 栗 三 山 國 王 信 仰 及 其 地 方 社 會 意 涵 The Influences and Implications of Local Societies to Three Mountain Kings Belief, in Taiwan Miaoli 研 究 生 : 林 永 恩 指 導

苗 栗 三 山 國 王 信 仰 及 其 地 方 社 會 意 涵 The Influences and Implications of Local Societies to Three Mountain Kings Belief, in Taiwan Miaoli 研 究 生 : 林 永 恩 指 導 國 立 交 通 大 學 客 家 文 化 學 院 客 家 社 會 與 文 化 學 程 碩 士 論 文 苗 栗 三 山 國 王 信 仰 及 其 地 方 社 會 意 涵 The Influences and Implications of Local Societies to Three Mountain Kings Belief, in Taiwan Miaoli 研 究 生 : 林 永 恩 指 導 教

More information

10384 27720071152270 UDC SHIBOR - Research on Dynamics of Short-term Shibor via Parametric and Nonparametric Models 2 0 1 0 0 5 2 0 1 0 0 5 2 0 1 0 0 5 2010 , 1. 2. Shibor 2006 10 8 2007 1 4 Shibor

More information

LH_Series_Rev2014.pdf

LH_Series_Rev2014.pdf REMINDERS Product information in this catalog is as of October 2013. All of the contents specified herein are subject to change without notice due to technical improvements, etc. Therefore, please check

More information

Construction of Chinese pediatric standard database A Dissertation Submitted for the Master s Degree Candidate:linan Adviser:Prof. Han Xinmin Nanjing

Construction of Chinese pediatric standard database A Dissertation Submitted for the Master s Degree Candidate:linan Adviser:Prof. Han Xinmin Nanjing 密 级 : 公 开 学 号 :20081209 硕 士 学 位 论 文 中 医 儿 科 标 准 数 据 库 建 设 研 究 研 究 生 李 楠 指 导 教 师 学 科 专 业 所 在 学 院 毕 业 时 间 韩 新 民 教 授 中 医 儿 科 学 第 一 临 床 医 学 院 2011 年 06 月 Construction of Chinese pediatric standard database

More information

IPCC CO (IPCC2006) 1 : = ( 1) 1 (kj/kg) (kgc/gj) (tc/t)

IPCC CO (IPCC2006) 1 : = ( 1) 1 (kj/kg) (kgc/gj) (tc/t) 2011 5 5 (278 ) China Industrial Economics May 2011 No.5 1 12 (1. 100005; 2. 066004) [ ] : ; ; : ; ; [ ] ; ; ; [ ]F290 [ ]A [ ]1006-480X(2011)05-0047-11 2008 CO 2 ( ) (2009) (GDP) (Binhocker et al. 2008)

More information

The Development of Color Constancy and Calibration System

The Development of Color Constancy and Calibration System The Development of Color Constancy and Calibration System The Development of Color Constancy and Calibration System LabVIEW CCD BMP ii Abstract The modern technologies develop more and more faster, and

More information

Time Estimation of Occurrence of Diabetes-Related Cardiovascular Complications by Ching-Yuan Hu A thesis submitted in partial fulfillment of the requi

Time Estimation of Occurrence of Diabetes-Related Cardiovascular Complications by Ching-Yuan Hu A thesis submitted in partial fulfillment of the requi Time Estimation of Occurrence of Diabetes-Related Cardiovascular Complications by Ching-Yuan Hu Master of Science 2011 Institute of Chinese Medical Sciences University of Macau Time Estimation of Occurrence

More information

附件1:

附件1: 附 件 1: 全 国 优 秀 教 育 硕 士 专 业 学 位 论 文 推 荐 表 单 位 名 称 : 西 南 大 学 论 文 题 目 填 表 日 期 :2014 年 4 月 30 日 数 学 小 组 合 作 学 习 的 课 堂 管 理 攻 硕 期 间 及 获 得 硕 士 学 位 后 一 年 内 获 得 与 硕 士 学 位 论 文 有 关 的 成 果 作 者 姓 名 论 文 答 辩 日 期 学 科 专

More information

UDC The Policy Risk and Prevention in Chinese Securities Market

UDC The Policy Risk and Prevention in Chinese Securities Market 10384 200106013 UDC The Policy Risk and Prevention in Chinese Securities Market 2004 5 2004 2004 2004 5 : Abstract Many scholars have discussed the question about the influence of the policy on Chinese

More information

A VALIDATION STUDY OF THE ACHIEVEMENT TEST OF TEACHING CHINESE AS THE SECOND LANGUAGE by Chen Wei A Thesis Submitted to the Graduate School and Colleg

A VALIDATION STUDY OF THE ACHIEVEMENT TEST OF TEACHING CHINESE AS THE SECOND LANGUAGE by Chen Wei A Thesis Submitted to the Graduate School and Colleg 上 海 外 国 语 大 学 SHANGHAI INTERNATIONAL STUDIES UNIVERSITY 硕 士 学 位 论 文 MASTER DISSERTATION 学 院 国 际 文 化 交 流 学 院 专 业 汉 语 国 际 教 育 硕 士 题 目 届 别 2010 届 学 生 陈 炜 导 师 张 艳 莉 副 教 授 日 期 2010 年 4 月 A VALIDATION STUDY

More information

University of Science and Technology of China A dissertation for master s degree Research of e-learning style for public servants under the context of

University of Science and Technology of China A dissertation for master s degree Research of e-learning style for public servants under the context of 中 国 科 学 技 术 大 学 硕 士 学 位 论 文 新 媒 体 环 境 下 公 务 员 在 线 培 训 模 式 研 究 作 者 姓 名 : 学 科 专 业 : 导 师 姓 名 : 完 成 时 间 : 潘 琳 数 字 媒 体 周 荣 庭 教 授 二 一 二 年 五 月 University of Science and Technology of China A dissertation for

More information

ABSTRACT ABSTRACT As we know the Sinology has a long history. As earily as 19 th century some works have already been done in this field. And among this the studies of lineages and folk beliefs in Southeast

More information

18 A B S 17.44±1() ±6.26( ) 54.23±5.5( ) 6.42±1.51() m 30m t α =.05 ( )AB 1 5 (p>.05)( )AB 1 5 (p<.05)( )A (p>.05)( )B (p<.05)( )A B

18 A B S 17.44±1() ±6.26( ) 54.23±5.5( ) 6.42±1.51() m 30m t α =.05 ( )AB 1 5 (p>.05)( )AB 1 5 (p<.05)( )A (p>.05)( )B (p<.05)( )A B The Effect of Different Training Method on Quick Coordination ability in High School Female Soccer Players 18 A B S 17.44±1() 161.39±6.26( ) 54.23±5.5( ) 6.42±1.51() 15 60-90 30m 30m t α =.05 ( )AB 1 5

More information

热设计网

热设计网 例 例 Agenda Popular Simulation software in PC industry * CFD software -- Flotherm * Advantage of Flotherm Flotherm apply to Cooler design * How to build up the model * Optimal parameter in cooler design

More information

A Study on the Relationships of the Co-construction Contract A Study on the Relationships of the Co-Construction Contract ( ) ABSTRACT Co-constructio in the real estate development, holds the quite

More information

東吳大學

東吳大學 律 律 論 論 療 行 The Study on Medical Practice and Coercion 林 年 律 律 論 論 療 行 The Study on Medical Practice and Coercion 林 年 i 讀 臨 療 留 館 讀 臨 律 六 礪 讀 不 冷 療 臨 年 裡 歷 練 禮 更 老 林 了 更 臨 不 吝 麗 老 劉 老 論 諸 見 了 年 金 歷 了 年

More information

PowerPoint Presentation

PowerPoint Presentation Current Status of the 10G-EPON Power Budget and Improvement David Li, Hisense-Ligent dli@ligentphotonics.com Ligent Supporters Bo Wang, China Telecom David Li, Ligent Photonics Frank Chang, Vitesse Corp.

More information

世新稿件end.doc

世新稿件end.doc Research Center For Taiwan Economic Development (RCTED) 2003 8 1 2 Study of Operational Strategies on Biotechnology Pharmaceutical Products Industry in Taiwan -- Case Study on Sinphar Pharmaceutical Company

More information

10384 200115009 UDC Management Buy-outs MBO MBO MBO 2002 MBO MBO MBO MBO 000527 MBO MBO MBO MBO MBO MBO MBO MBO MBO MBO MBO Q MBO MBO MBO Abstract Its related empirical study demonstrates a remarkable

More information

iml88-0v C / 8W T Tube EVM - pplication Notes. IC Description The iml88 is a Three Terminal Current Controller (TTCC) for regulating the current flowi

iml88-0v C / 8W T Tube EVM - pplication Notes. IC Description The iml88 is a Three Terminal Current Controller (TTCC) for regulating the current flowi iml88-0v C / 8W T Tube EVM - pplication Notes iml88 0V C 8W T Tube EVM pplication Notes Table of Content. IC Description.... Features.... Package and Pin Diagrams.... pplication Circuit.... PCB Layout

More information

10384 199928010 UDC 2002 4 2002 6 2002 2002 4 DICOM DICOM 1. 2. 3. Canny 4. 5. DICOM DICOM DICOM DICOM I Abstract Eyes are very important to our lives. Biologic parameters of anterior segment are criterions

More information

A dissertation for Master s degree Metro Indoor Coverage Systems Analysis And Design Author s Name: Sheng Hailiang speciality: Supervisor:Prof.Li Hui,

A dissertation for Master s degree Metro Indoor Coverage Systems Analysis And Design Author s Name: Sheng Hailiang speciality: Supervisor:Prof.Li Hui, 中 国 科 学 技 术 大 学 工 程 硕 士 学 位 论 文 地 铁 内 移 动 通 信 室 内 覆 盖 分 析 及 应 用 作 者 姓 名 : 学 科 专 业 : 盛 海 亮 电 子 与 通 信 导 师 姓 名 : 李 辉 副 教 授 赵 红 媛 高 工 完 成 时 间 : 二 八 年 三 月 十 日 University of Science and Technology of Ch A dissertation

More information

13 A DSS B DSS C DSS D DSS A. B. C. CPU D. 15 A B Cache C Cache D L0 L1 L2 Cache 16 SMP A B. C D 17 A B. C D A B - C - D

13 A DSS B DSS C DSS D DSS A. B. C. CPU D. 15 A B Cache C Cache D L0 L1 L2 Cache 16 SMP A B. C D 17 A B. C D A B - C - D 2008 1 1 A. B. C. D. UML 2 3 2 A. B. C. D. 3 A. B. C. D. UML 4 5 4 A. B. C. D. 5 A. B. C. D. 6 6 A. DES B. RC-5 C. IDEA D. RSA 7 7 A. B. C. D. TCP/IP SSL(Security Socket Layer) 8 8 A. B. C. D. 9 9 A. SET

More information

Important Notice SUNPLUS TECHNOLOGY CO. reserves the right to change this documentation without prior notice. Information provided by SUNPLUS TECHNOLO

Important Notice SUNPLUS TECHNOLOGY CO. reserves the right to change this documentation without prior notice. Information provided by SUNPLUS TECHNOLO Car DVD New GUI IR Flow User Manual V0.1 Jan 25, 2008 19, Innovation First Road Science Park Hsin-Chu Taiwan 300 R.O.C. Tel: 886-3-578-6005 Fax: 886-3-578-4418 Web: www.sunplus.com Important Notice SUNPLUS

More information

Thesis for the Master degree in Engineering Research on Negative Pressure Wave Simulation and Signal Processing of Fluid-Conveying Pipeline Leak Candi

Thesis for the Master degree in Engineering Research on Negative Pressure Wave Simulation and Signal Processing of Fluid-Conveying Pipeline Leak Candi U17 10220 UDC624 Thesis for the Master degree in Engineering Research on Negative Pressure Wave Simulation and Signal Processing of Fluid-Conveying Pipeline Leak Candidate:Chen Hao Tutor: Xue Jinghong

More information

Untitled-3

Untitled-3 SEC.. Separable Equations In each of problems 1 through 8 solve the given differential equation : ü 1. y ' x y x y, y 0 fl y - x 0 fl y - x 0 fl y - x3 3 c, y 0 ü. y ' x ^ y 1 + x 3 x y 1 + x 3, y 0 fl

More information

Abstract The purpose of this study is to analyze mother s meta-emotion philosophy and children s emotional adjustment in order to understand the actual interaction between mother s meta-emotion philosophy

More information

[ ],,,,,,,,,,,,,,, [ ] ; ; ; [ ]F120 [ ]A [ ] X(2018) , :,,,, ( ),,,,,,, [ ] [ ],, 5

[ ],,,,,,,,,,,,,,, [ ] ; ; ; [ ]F120 [ ]A [ ] X(2018) , :,,,, ( ),,,,,,, [ ] [ ],, 5 2018 4 [ ] [ ] ; ; ; [ ]F120 [ ]A [ ]1006-480X(2018)04-0005-14 : ( ) [ ] 2018-03-02 [ ] :jinpei8859@163.com 5 : ( ) ( ) : ( ) ( ) (1983) : 6 2018 4 ; (1983) : : ; : ; : ; ; : W G W G W G 7 :?! : ( ) (

More information

國立中山大學學位論文典藏.PDF

國立中山大學學位論文典藏.PDF 93 2 () ()A Study of Virtual Project Team's Knowledge Integration and Effectiveness - A Case Study of ERP Implementation N924020024 () () ()Yu ()Yuan-Hang () ()Ho,Chin-Fu () ()Virtual Team,Knowledge Integration,Project

More information

2005 5,,,,,,,,,,,,,,,,, , , 2174, 7014 %, % 4, 1961, ,30, 30,, 4,1976,627,,,,, 3 (1993,12 ),, 2

2005 5,,,,,,,,,,,,,,,,, , , 2174, 7014 %, % 4, 1961, ,30, 30,, 4,1976,627,,,,, 3 (1993,12 ),, 2 3,,,,,, 1872,,,, 3 2004 ( 04BZS030),, 1 2005 5,,,,,,,,,,,,,,,,, 1928 716,1935 6 2682 1928 2 1935 6 1966, 2174, 7014 %, 94137 % 4, 1961, 59 1929,30, 30,, 4,1976,627,,,,, 3 (1993,12 ),, 2 , :,,,, :,,,,,,

More information

C doc

C doc No. C2004010 2004-7 No. C2004010 2004 7 16 1 No. C2004010 2004 7 16 2000 1990 1990 2000 ( ),, 2 / 2000 1990 1990 2000 ( ),, (1952-) 100871 It is not Appropriate to Remove the Birth Spacing Policy Now,

More information

<4D6963726F736F667420506F776572506F696E74202D20B5DAD2BBD5C228B4F2D3A1B0E6292E707074205BBCE6C8DDC4A3CABD5D>

<4D6963726F736F667420506F776572506F696E74202D20B5DAD2BBD5C228B4F2D3A1B0E6292E707074205BBCE6C8DDC4A3CABD5D> Homeworks ( 第 三 版 ):.4 (,, 3).5 (, 3).6. (, 3, 5). (, 4).4.6.7 (,3).9 (, 3, 5) Chapter. Number systems and codes 第 一 章. 数 制 与 编 码 . Overview 概 述 Information is of digital forms in a digital system, and

More information

2005 3,, :,,,, (),,,,, [],,,,,,,,,,, (),, (,, ),,,,,,,,,,,,,,,,,,,,,,,,,( ),, :,,, :,?,,, :,1999,

2005 3,, :,,,, (),,,,, [],,,,,,,,,,, (),, (,, ),,,,,,,,,,,,,,,,,,,,,,,,,( ),, :,,, :,?,,, :,1999, (), 70,,,,,, :,31, 1555,;18,,, ;,, (1895),, 1 2005 3,, :,,,, (),,,,, [],,,,,,,,,,, (),, (,, ),,,,,,,,,,,,,,,,,,,,,,,,,( ),, :,,, :,?,,, :,1999,24 25 2 (),,,,,,, 1970,,,,,, ;,; :,,?? 1988,,, 1987 1990,,,

More information