-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase optimization for ugwp_driver_v0.F #76
Conversation
if(CMAKE_BUILD_TYPE STREQUAL "Release" AND ${CMAKE_Fortran_COMPILER_ID} STREQUAL "Intel") | ||
SET_SOURCE_FILES_PROPERTIES(${LOCAL_CURRENT_SOURCE_DIR}/physics/ugwp_driver_v0.F | ||
APPEND_STRING PROPERTY COMPILE_FLAGS " ${CMAKE_Fortran_FLAGS_PHYSICS} -fp-model=fast -fprotect-parens -fimf-precision=high") | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also restrict this with if(FASTER)
so that the original Release
compilation options don't change.
Will do.
…On Tue, May 30, 2023 at 7:48 AM Samuel Trahan (NOAA contractor) < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In CMakeLists.txt
<#76 (comment)>
:
> +if(CMAKE_BUILD_TYPE STREQUAL "Release" AND ${CMAKE_Fortran_COMPILER_ID} STREQUAL "Intel")
+ SET_SOURCE_FILES_PROPERTIES(${LOCAL_CURRENT_SOURCE_DIR}/physics/ugwp_driver_v0.F
+ APPEND_STRING PROPERTY COMPILE_FLAGS " ${CMAKE_Fortran_FLAGS_PHYSICS} -fp-model=fast -fprotect-parens -fimf-precision=high")
+endif()
You should also restrict this with if(FASTER) so that the original Release
compilation options don't change.
—
Reply to this email directly, view it on GitHub
<#76 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2GYTZA3LFHCJNG5BV3XIXUA3ANCNFSM6AAAAAAYTC2UB4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Sam,
Protecting this change with a check for use of -DFASTER does limit the
consequences to the following regression tests even though other tests run
with a FASTER executable. However, that also limits the benefits of this
optimization to those projects that know about and use the -DFASTER flag.
cpld_control_p8_faster 017 failed in check_result
cpld_control_p8_faster 017 failed in run_test
control_p8_faster 077 failed in check_result
control_p8_faster 077 failed in run_test
Interestingly, all of the following RT executables are compiled with the
new flags.
fv3_006.exe
fv3_012.exe
fv3_021.exe
fv3_032.exe
Most of the RT tests that enable some form of the ugwp algorithm are
compiled without FASTER. The exceptions are the tests listed above..
In particular, test hafs_regional_storm_following_1nest_atm_ocn_wav is
compiled with FASTER, but the ugwp option is disabled in both parent and
nest namelists.
Not sure how to proceed.
…On Tue, May 30, 2023 at 9:09 AM Dan Kokron ***@***.***> wrote:
Will do.
On Tue, May 30, 2023 at 7:48 AM Samuel Trahan (NOAA contractor) <
***@***.***> wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In CMakeLists.txt
> <#76 (comment)>
> :
>
> > +if(CMAKE_BUILD_TYPE STREQUAL "Release" AND ${CMAKE_Fortran_COMPILER_ID} STREQUAL "Intel")
> + SET_SOURCE_FILES_PROPERTIES(${LOCAL_CURRENT_SOURCE_DIR}/physics/ugwp_driver_v0.F
> + APPEND_STRING PROPERTY COMPILE_FLAGS " ${CMAKE_Fortran_FLAGS_PHYSICS} -fp-model=fast -fprotect-parens -fimf-precision=high")
> +endif()
>
> You should also restrict this with if(FASTER) so that the original
> Release compilation options don't change.
>
> —
> Reply to this email directly, view it on GitHub
> <#76 (review)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ACODV2GYTZA3LFHCJNG5BV3XIXUA3ANCNFSM6AAAAAAYTC2UB4>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
The purpose of the FASTER flag is to test a new combination of compilation options that will eventually replace the current Release. We're keeping the original options unmodified during a transition period so modelers can test the new and old options together.. |
Got it. I'll push the change and update the supporting text.
…On Tue, May 30, 2023 at 3:33 PM Samuel Trahan (NOAA contractor) < ***@***.***> wrote:
Protecting this change with a check for use of -DFASTER does limit the
consequences to the following regression tests even though other tests run
with a FASTER executable. However, that also limits the benefits of this
optimization to those projects that know about and use the -DFASTER flag.
The purpose of the FASTER flag is to test a new combination of compilation
options that will eventually replace the current Release. We're keeping the
original options unmodified during a transition period so modelers can test
the new and old options together..
—
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2HU273VWLMUOQPWGILXIZKQZANCNFSM6AAAAAAYTC2UB4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
If there are specific HAFS tests you want to run with |
Sam,
I think its odd that there isn't a test in the suite that uses the same
physics settings as HAFS operations.
Dan
…On Tue, May 30, 2023 at 4:06 PM Samuel Trahan (NOAA contractor) < ***@***.***> wrote:
If there are specific HAFS tests you want to run with -DFASTER=ON, we can
add regression tests. You can see some _faster tests in tests/rt.conf for
other models that are experimenting with this.
—
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2CPPTSKX3DZKMURLG3XIZOONANCNFSM6AAAAAAYTC2UB4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Thanks @dkokron. I see that there are not upstream PRs into fv3atm and ufs-weather-model for this change. Since you're only changing CMakelists, it would be great to combine this with another PR, but since this changes the RT baselines by itself, it makes it trickier since we don't like to combine PRs that change baselines for different reasons. It looks like #77 doesn't change baselines and is a candidate for combining this into that one, which would make upstream PRs unnecessary. I'll inquire about doing this. |
Also, like @SamuelTrahanNOAA said, we can ask about either adding a test, or ideally, modifying and existing test to not add to the testing load. I think that @DusanJovic-NOAA, @junwang-noaa, @BrianCurtis-NOAA and @BinLiu-NOAA would be good to ask their opinion on updating rt.conf or individual tests for HAFS. |
My preference is to update the existing tests, if they don't accurately represent planned HAFS configurations. |
Correct, I haven't started the upstream PRs yet. Let me know how you want
to handle #76
…On Fri, Jun 2, 2023 at 9:23 AM Grant Firl ***@***.***> wrote:
Thanks @dkokron <https://github.com/dkokron>. I see that there are not
upstream PRs into fv3atm and ufs-weather-model for this change. Since
you're only changing CMakelists, it would be great to combine this with
another PR, but since this changes the RT baselines by itself, it makes it
trickier since we don't like to combine PRs that change baselines for
different reasons. It looks like #77
<#77> doesn't change
baselines and is a candidate for combining this into that one, which would
make upstream PRs unnecessary. I'll inquire about doing this.
—
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2DN2BDXJD7W23DQRVLXJHZMTANCNFSM6AAAAAAYTC2UB4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I think the way we should handle that is to separate the PRs. Have this PR only update the optimization options. Then use another PR to update the HAFS regression test configuration. The HAFS regression test reconfiguration could be rolled into my ufs-community/ufs-weather-model#1769. That one corrects some RRFS regression tests, and the only code changes don't change the results. |
Why are the compiler flags changed only for one file? What is so special about this file? Have you tried to use these flags for all other files? Are you getting reproducible answers on all platforms? |
Tells the compiler to use more aggressive optimizations when implementing floating-point calculations. These optimizations increase speed, but may affect the accuracy or reproducibility of floating-point computations. I'd suggest we do more testing before adding this feature to "FASTER" option, we need to confirm the accuracy. We are maintaining reproducibility for "Release" mode. |
Dusan,
There is a general reluctance to apply "-fp-model=fast" everywhere and I
did thorough numerical testing only for fv3_ugwp_solv2_v0().
Dan
…On Fri, Jun 2, 2023 at 10:04 AM Dusan Jovic ***@***.***> wrote:
Why are the compiler flags changed only for one file? What is so special
about this file? Have you tried to use these flags for all other files? Are
you getting reproducible answers on all platforms?
—
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2B7B56MXJIMMRQV4ULXJH6IZANCNFSM6AAAAAAYTC2UB4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Jun,
I described the numerical testing I did in the PR. Please describe what
further testing is needed.
Dan
…On Fri, Jun 2, 2023 at 10:28 AM Jun Wang ***@***.***> wrote:
From
https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-8/fp-model-fp.html
:
Tells the compiler to use more aggressive optimizations when implementing
floating-point calculations. These optimizations increase speed, but may
affect the accuracy or reproducibility of floating-point computations.
I'd suggest we do more testing before adding this feature to "FASTER"
option, we need to confirm the accuracy. We are maintaining reproducibility
for "Release" mode.
—
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2CATZ3YOZER4CI625LXJIBC7ANCNFSM6AAAAAAYTC2UB4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
That might be a question for HAFS group. UFS RT is just one canned case. From my point of view, cpld_control_p8 and control_p8 producing different results already shows the results will change when using this option. But I understand there is a balance on speed and the reproducibility. So it's each application's decision to use this option or not. |
GWD should be only taking a very small portion of the CPU time of the entire physics package. I'd suggest the developer looking inside the code to find out where the bottleneck is, and optimizing the code if possible. Simply changing the compiler option without checking the code may not be the best approach ( informing @mdtoyNOAA ) |
Fanglin,
I included before and after timings in the original PR. No subroutine
consumes more than a few percent of walltime, but making improvements to several routines does make a difference.
Dan
…On Fri, Jun 2, 2023 at 11:05 AM Fanglin Yang ***@***.***> wrote:
GWD should be only taking a very small portion of the CPU time of the
entire physics package. I'd suggest the developer looking inside the code
to find out where the bottleneck is, and optimizing the code if possible.
Simply changing the compiler option without checking the code may not be
the best approach ( informing @mdtoyNOAA <https://github.com/mdtoyNOAA> )
—
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2DSLPZBRNH3IZKPDY3XJIFLJANCNFSM6AAAAAAYTC2UB4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@dkokron @junwang-noaa @DusanJovic-NOAA @yangfanglin Coming back to Dusan's and Fanglin's comments, do we want to go forward with a one file compiler option change for the HAFS application like this PR suggests? Or do we want to explore adding flags to the rest of physics (since I would think GWD would not be high on the list of schemes eating up compute cycles) or asking the developer of the scheme to make it work faster? On one hand, if this change really makes that much of a difference for the HAFS application, I don't want to stand in the way, but on the other hand, I agree with @DusanJovic-NOAA and @yangfanglin that this is an oddly "targeted" solution for speeding up physics. |
The targeted nature is due to the overall hesitance in the HAFS group to
make changes that alter the numerics in unknown ways.
…On Tue, Jun 6, 2023 at 4:23 PM Grant Firl ***@***.***> wrote:
@dkokron <https://github.com/dkokron> @junwang-noaa
<https://github.com/junwang-noaa> @DusanJovic-NOAA
<https://github.com/DusanJovic-NOAA> @yangfanglin
<https://github.com/yangfanglin> Coming back to Dusan's and Fanglin's
comments, do we want to go forward with a one file compiler option change
for the HAFS application like this PR suggests? Or do we want to explore
adding flags to the rest of physics (since I would think GWD would not be
high on the list of schemes eating up compute cycles) or asking the
developer of the scheme to make it work faster?
On one hand, if this change really makes that much of a difference for the
HAFS application, I don't want to stand in the way, but on the other hand,
I agree with @DusanJovic-NOAA <https://github.com/DusanJovic-NOAA> and
@yangfanglin <https://github.com/yangfanglin> that this is an oddly
"targeted" solution for speeding up physics.
—
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2FVTJTKY3P2IAWMAZTXJ6NTVANCNFSM6AAAAAAYTC2UB4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@BinLiu-NOAA @ZhanZhang-NOAA @junwang-noaa @DusanJovic-NOAA @yangfanglin Do any of you have strong opinions about whether this change is desired? I know that some reservations were expressed. Should we continue merging this or close it? |
After talking with UFS code managers (including @BinLiu-NOAA for HAFS), it was decided to close this PR as-is and to instead explore adding the extra compilation flags to all physics. See #86. |
Performance profiling of a HAFS case on NOAA systems revealed significant of time was spent in subroutine fv3_ugwp_solv2_v0(). This pull request (PR) allows the compiler to fully optimize this routine while maintaining round-off level differences.
How Has This Been Tested?
My testing includes the creation of an offline driver for fv3_ugwp_solv2_v0(). I extracted full volume (all ranks and threads) inputs to and outputs from fv3_ugwp_solv2_v0() at time step 290 (5.8 hours) into a HAFS case. The offline driver feeds the saved inputs into fv3_ugwp_solv2_v0() and compares the output against the saved ground truth. The driver gives zero-diff output when compiled with the flags used in production.
I found a set of compile options that allows the fv3_ugwp_solv2_v0() to run more than 2x faster than the baseline yet gives absolute differences of +- e-14. Numerical stats attached as ugwp_results.txt.
ugwp_results.txt
The driver and associated input and output files are located at
alogin02:/lfs/h1/hpc/support/daniel.kokron/HAFS/hafsv1_final/T2O_2020092200_17L/UGWPsolv2V0/driver.f90
alogin02:/lfs/h1/hpc/support/daniel.kokron/HAFS/hafsv1_final/T2O_2020092200_17L/UGWPsolv2V0/IO
This PR is a non zero-diff change, so the baseline will need to be regenerated.
I ran the rt.sh suite on acorn and cactus using the "-c" option then again with "-m". The logs from both runs on both machines indicated "REGRESSION TEST WAS SUCCESSFUL"
Performance metric:
Add up the phase1 and phase2 timings printed in the output listing
grep PASS stdout | awk '{t+=$10;print t}' | tail -1
The units are seconds.
I ran a 126-hour simulation on acorn using a 26-node case.
acorn:/lfs/h1/hpc/support/daniel.kokron/HAFS/hafsv1_final/T2O_2020092200_17L
I also ran a 24-hour simulation on cactus using a 47-node case provided to me by Bin Liu
cactus:/lfs/h2/emc/ptmp/bin.liu/hafsv1_merge_hfsb/2021082712/09L/forecast
The units are seconds.
Based on these timings, I expect savings from this PR of ~(52.8s)*5=264 seconds for a full 5-day simulation.