2018 CSDMS meeting-079: Difference between revisions
Xinshengqin (talk | contribs) Created page with "{{CSDMS meeting personal information template-2018 |CSDMS meeting first name=Xinsheng |CSDMS meeting last name=Qin |CSDMS meeting institute=University of Washington |CSDMS mee..." |
No edit summary |
||
Line 53: | Line 53: | ||
Clawpack is an open source library for solving general hyperbolic wave-propagation problems with AMR. It is the basis for the GeoClaw package used for modeling tsunamis, storm surge, and floods. It has also been used for coupled seismic-tsunami simulations. Recently, we have accelerated the library with GPUs and observe a speed-up of 2.5 in a benchmark problem using AMR on a NVIDIA K20 GPU. Many functions that facilitate the execution of computing kernels are added. Customized and CPU thread-safe memory managers are designed to manage GPU and CPU memory pools, which is essential in eliminating overhead of memory allocation and de-allocation. A global reduction is conducted on each AMR grid patch for dynamically adjusting the time step. To avoid copying back fluxes at cell edges from the GPU memory to the CPU memory, the conservation fixes required between patches on different levels are also conducted on the GPU. Some of these kernels are merged into bigger kernels, which greatly reduces the overhead of launching CUDA kernels. | Clawpack is an open source library for solving general hyperbolic wave-propagation problems with AMR. It is the basis for the GeoClaw package used for modeling tsunamis, storm surge, and floods. It has also been used for coupled seismic-tsunami simulations. Recently, we have accelerated the library with GPUs and observe a speed-up of 2.5 in a benchmark problem using AMR on a NVIDIA K20 GPU. Many functions that facilitate the execution of computing kernels are added. Customized and CPU thread-safe memory managers are designed to manage GPU and CPU memory pools, which is essential in eliminating overhead of memory allocation and de-allocation. A global reduction is conducted on each AMR grid patch for dynamically adjusting the time step. To avoid copying back fluxes at cell edges from the GPU memory to the CPU memory, the conservation fixes required between patches on different levels are also conducted on the GPU. Some of these kernels are merged into bigger kernels, which greatly reduces the overhead of launching CUDA kernels. | ||
|CSDMS meeting posterPDF= Qin CSDMS POSTER May2018.pdf | |||
|CSDMS meeting posterPNG= Qin CSDMS POSTER May2018.png | |||
}} | }} | ||
{{blank line template}} | {{blank line template}} |
Latest revision as of 11:15, 25 May 2018
Log in (or create account for non-CSDMS members)
Forgot username? Search or email:CSDMSweb@colorado.edu
Browse abstracts
Accelerating Block-Structured Adaptive Mesh Refinement (AMR) with GPUs

Graphics Processing Units (GPUs) have been shown to be very successful in accelerating simulation in many fields. When they are used to accelerate simulation of earthquakes and tsunamis, a big challenge comes from the use of adaptive mesh refinement (AMR) in the code, often necessary for capturing dynamically evolving small-scale features without excessive resolution in other regions of the domain.
Clawpack is an open source library for solving general hyperbolic wave-propagation problems with AMR. It is the basis for the GeoClaw package used for modeling tsunamis, storm surge, and floods. It has also been used for coupled seismic-tsunami simulations. Recently, we have accelerated the library with GPUs and observe a speed-up of 2.5 in a benchmark problem using AMR on a NVIDIA K20 GPU. Many functions that facilitate the execution of computing kernels are added. Customized and CPU thread-safe memory managers are designed to manage GPU and CPU memory pools, which is essential in eliminating overhead of memory allocation and de-allocation. A global reduction is conducted on each AMR grid patch for dynamically adjusting the time step. To avoid copying back fluxes at cell edges from the GPU memory to the CPU memory, the conservation fixes required between patches on different levels are also conducted on the GPU. Some of these kernels are merged into bigger kernels, which greatly reduces the overhead of launching CUDA kernels.