Skip to content

Commit

Permalink
Removing preprocessor directives to re-enable print statements on GPU…
Browse files Browse the repository at this point in the history
… for debug and other conditions.

Original problem:
-----------------

Following feedback that debug information was still desirable for OpenACC device-
executed code where possible, this change removes all preprocessor directives which
were guarding against the compilation of statements which wrote to standard output.
These directives were originally used because debug statements and other standard
output had the potential to greatly reduce performance because of the need to copy over
certain variables from the host to the device just for debug output purposes. Additionally,
when statements were located within parallel-execution regions, the output was not
guaranteed to be presented in any specific order and the additional IF-branches in the
code also would have reduced performance as branching is not efficient when on SIMD
architectures.

Resolutions:
------------

However, with a bit of extra work, a few of these issues are alleviated to allow output to
work again as requested. First, on the data optimization side of the problem, the impact
of pulling in variables just for debugging was minimized by ensuring the data was pulled
in and resident on the GPU for the entire subroutine execution. While this increases the
memory footprint on the device which may have very limited memory, it reduces the data
transfer related performance hit. Next, in the cases where debug output was not within
parallel regions but still needing to be executed on the GPU to show the proper values
at that state of the overall program execution, OpenACC serial regions were used.
These allow the data to not have to be transferred off the GPU mid-execution of the
program just to be shown as debug output and also partially solve the problem of
out-of-order output. Since debug regions are guarded by IF blocks, these serial regions
do not significantly impact performance when debug output is turned off (debug_code=0).
However, slowdown is significant for any other debug-levels which should be acceptable
for debugging situations.

Performance Changes:
--------------------

Overall, these changes accomplish the goal of re-enabling debugging output, but not
completely without a cost. Overall runtime was slightly impacted on the GPU when tested
with 150k and 750k vertical columns (the value of ite used in the i-loops) and debugging
turned off (debug_code=0). For 150k columns, the GPU decreased in speed from the
original baseline of 22ms to 30ms. For 750k columns, the GPU decreased in speed from
the original baseline of 31ms to 70ms. The impact is greater for the larger number of
columns due to the impact of the number of times the mid-loop IF branches are
evaluated on the GPU. While these are slight declines in performance, these are still
significant speedups over the CPU-only tests (8.7x and 18.7x speedups for 150k and
750k, respectively).

Compilation Time Changes:
-------------------------

One additional noted observation regarding performance is compilation time. When all
debug output is disabled (debug_code=0), compilation time is approximately 90 seconds
with the additional serial blocks, IF-branches, and so forth as each of these require more
work from the OpenACC compiler to generate code for the GPU. This problem is
compounded when the debug_code option is increase to either 1 (some debug output)
or 2 (full debug output). At a value of 1, compilation time jumps up to approximately
12.5 minutes on the Hera GPU nodes. At a value of 2, compilation time increases further
to approximately 18.5 minutes on the same GPU nodes. The explanation for this is the
need for the OpenACC compiler to enable greater amounts of serial and branching code
that (again) are less optimal on the GPU and so the compiler must do more work to try
to optimize them as best it can.
  • Loading branch information
timsliwinski-noaa committed Aug 28, 2023
1 parent 95e9ff2 commit 36a313e
Showing 1 changed file with 49 additions and 63 deletions.
Loading

0 comments on commit 36a313e

Please sign in to comment.