-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redundant caching in assim_tools_mod.f90 #368
Conversation
…pletely after testing
…_x arrays (allocation, definition of get_close_x_cached() subroutines, subroutine calls, deallocation, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for implementing, testing and timing. We should try to get this into use by as many users as possible. Kevin with the reanalysis is a good candidate.
@mjs2369 |
I think this slipped by review, the close_X arrays should be intent(inout) otherwise they are undefined on entry to the subroutine.
Edit: fix on this branch main...get_close_caching_inout |
|
the caching was fixed in #368
Description:
Removal of redundant caching in the get_close_obs_cached and get_close_state_cached subroutines in DART/assimilation_code/modules/assimilation/assim_tools_mod.f90 .
last_close_X_dist = close_X_dist
andlast_close_X_ind = close_X_ind
Fixes issue
Fixes #364
Types of changes
(optimization)
Documentation changes needed?
Tests
Tested with WRF, cam-fv, and MPAS_ATM. Test cases can be found in /glade/scratch/hkershaw/DART/siparcs2022/test_cases
The outputs of ensemble runs implementing the revised code proved to be bitwise identical with WRF and cam-fv runs on the main branch.
WRF and cam-fv were also built with all debugging flags, which did not produce any relevant errors or warnings.
The revisions did not have any effect on the peak memory usage.
The removal of the caching greatly reduced the runtime of MPAS_ATM, by about a factor of 3. The runtime of WRF remained consistent with executions on the main branch. The runtime of cam-fv did show some improvement, but it is much smaller than that of MPAS_ATM (improvement of about 40 seconds for an 11 minute run on the main branch).
The results are detailed below, but note that the runtime does vary slightly across multiple executions. Therefore I have included the results of multiple executions while using the time command.
WRF was run locally using the command
mpirun -n 4 ./filter
WRF (mpirun -n 4 ./filter, get_close_caching branch)
real 0m3.797s
user 0m11.026s
sys 0m2.190s
real 0m3.335s
user 0m10.114s
sys 0m1.410s
WRF (mpirun -n 4 ./filter, main branch)
real 0m3.424s
user 0m10.477s
sys 0m1.420s
————————————
cam-fv was run on interactive jobs on Cheyenne (select=8:ncpus=36:mpiprocs=36)
cam_fv (./filter, get_close_caching branch)
real 10m19.723s
user 368m17.440s
sys 0m37.255s
real 10m20.472s
user 368m43.953s
sys 0m37.571s
cam_fv (./filter, main branch)
real 11m5.138s
user 395m32.690s
sys 0m36.716s
real 11m2.530s
user 393m58.820s
sys 0m37.390s
————————————
MPAS_ATM was run on interactive jobs on Cheyenne (select=18:ncpus=36:mpiprocs=36)
mpas_atm (./filter, get_close_caching branch)
real 1m21.758s
user 42m53.521s
sys 0m23.436s
real 1m25.355s
user 45m1.992s
sys 0m22.445s
real 1m23.928s
user 44m26.556s
sys 0m21.748s
mpas_atm (./filter, main branch)
real 4m36.890s
user 160m3.367s
sys 0m22.718s
real 4m45.462s
user 165m3.165s
sys 0m23.599s
real 6m35.792s
user 171m29.205s
sys 0m22.464s
Checklist for merging
Testing Datasets
Test cases can be found in the following directories on Cheyenne:
/glade/scratch/hkershaw/DART/siparcs2022/test_cases/wrf
/glade/scratch/hkershaw/DART/siparcs2022/test_cases/CAM/CAM-Run
/glade/scratch/hkershaw/DART/siparcs2022/test_cases/mpas_atm/mpas_test