-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Loop Blocking for fn GPU Backend (#1787)
Implements loop blocking for the GPU fn backend. Thread block size (that is, CUDA/HIP threads per block) and loop block size (that is, loop iterations per CUDA/HIP thread) can now be specified as template parameters. Further changes: - Set `__launch_bounds__` in the fn GPU kernel based on the thread block size. - Activate vertical loop blocking in the fn nabla kernels on newer CUDA versions that support `GT_PROMISE`. Performance changes: - `__launch_bounds__` affects performance of the `fn_cartesian_vertical_advection` benchmark significantly (positively or negatively, depending on domain size). - Performance of fn nabla benchmarks improves significantly on newer CUDA versions. - Performance on Daint is currently reduced due to too old CUDA version.
- Loading branch information
Showing
13 changed files
with
87,166 additions
and
86,975 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.