Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant caching in assim_tools_mod.f90 #368

Merged
merged 8 commits into from
Jul 14, 2022
Merged

Redundant caching in assim_tools_mod.f90 #368

merged 8 commits into from
Jul 14, 2022

Conversation

mjs2369
Copy link
Contributor

@mjs2369 mjs2369 commented Jul 11, 2022

Description:

Removal of redundant caching in the get_close_obs_cached and get_close_state_cached subroutines in DART/assimilation_code/modules/assimilation/assim_tools_mod.f90 .

  • Make close_X arrays intent(inout)
  • Remove the explicit caching and copying in last_close_X_dist = close_X_dist and last_close_X_ind = close_X_ind
  • Remove all other instances of the last_close_X_dist and last_close_X_ind arrays

Fixes issue

Fixes #364

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

(optimization)

Documentation changes needed?

  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.

Tests

Tested with WRF, cam-fv, and MPAS_ATM. Test cases can be found in /glade/scratch/hkershaw/DART/siparcs2022/test_cases

The outputs of ensemble runs implementing the revised code proved to be bitwise identical with WRF and cam-fv runs on the main branch.

WRF and cam-fv were also built with all debugging flags, which did not produce any relevant errors or warnings.

The revisions did not have any effect on the peak memory usage.

The removal of the caching greatly reduced the runtime of MPAS_ATM, by about a factor of 3. The runtime of WRF remained consistent with executions on the main branch. The runtime of cam-fv did show some improvement, but it is much smaller than that of MPAS_ATM (improvement of about 40 seconds for an 11 minute run on the main branch).

The results are detailed below, but note that the runtime does vary slightly across multiple executions. Therefore I have included the results of multiple executions while using the time command.

WRF was run locally using the command mpirun -n 4 ./filter
WRF (mpirun -n 4 ./filter, get_close_caching branch)
real 0m3.797s
user 0m11.026s
sys 0m2.190s

real 0m3.335s
user 0m10.114s
sys 0m1.410s

WRF (mpirun -n 4 ./filter, main branch)
real 0m3.424s
user 0m10.477s
sys 0m1.420s

————————————
cam-fv was run on interactive jobs on Cheyenne (select=8:ncpus=36:mpiprocs=36)
cam_fv (./filter, get_close_caching branch)
real 10m19.723s
user 368m17.440s
sys 0m37.255s

real 10m20.472s
user 368m43.953s
sys 0m37.571s

cam_fv (./filter, main branch)
real 11m5.138s
user 395m32.690s
sys 0m36.716s

real 11m2.530s
user 393m58.820s
sys 0m37.390s

————————————
MPAS_ATM was run on interactive jobs on Cheyenne (select=18:ncpus=36:mpiprocs=36)
mpas_atm (./filter, get_close_caching branch)
real 1m21.758s
user 42m53.521s
sys 0m23.436s

real 1m25.355s
user 45m1.992s
sys 0m22.445s

real 1m23.928s
user 44m26.556s
sys 0m21.748s

mpas_atm (./filter, main branch)
real 4m36.890s
user 160m3.367s
sys 0m22.718s

real 4m45.462s
user 165m3.165s
sys 0m23.599s

real 6m35.792s
user 171m29.205s
sys 0m22.464s

Checklist for merging

  • Updated changelog entry
  • Documentation updated
  • Version tag

Testing Datasets

  • Dataset needed for testing available upon request
  • Dataset download instructions included
  • No dataset needed

Test cases can be found in the following directories on Cheyenne:
/glade/scratch/hkershaw/DART/siparcs2022/test_cases/wrf
/glade/scratch/hkershaw/DART/siparcs2022/test_cases/CAM/CAM-Run
/glade/scratch/hkershaw/DART/siparcs2022/test_cases/mpas_atm/mpas_test

Copy link
Collaborator

@nancycollins nancycollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

Copy link
Contributor

@jlaucar jlaucar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for implementing, testing and timing. We should try to get this into use by as many users as possible. Kevin with the reanalysis is a good candidate.

@hkershaw-brown
Copy link
Member

@mjs2369
quick note for when you do the CHANGELOG and release, you can say "performance improvement" vs. "bug fix".
This lets people know that there was nothing incorrect in the code, just that this change makes the code go faster.

@mjs2369 mjs2369 merged commit cbe0d41 into main Jul 14, 2022
@mjs2369 mjs2369 deleted the get_close_caching branch July 14, 2022 19:14
@hkershaw-brown
Copy link
Member

hkershaw-brown commented Jul 20, 2022

I think this slipped by review, the close_X arrays should be intent(inout) otherwise they are undefined on entry to the subroutine.

integer, intent(inout) :: close_obs_ind(:)
real(r8), intent(inout)::  close_obs_dist(:)
integer, intent(inout) :: close_state_dist(:)
real(r8), intent(inout)::  close_state_dist(:)

Edit: fix on this branch main...get_close_caching_inout

@hkershaw-brown
Copy link
Member

hkershaw-brown commented Jul 20, 2022

also this is passing a module global as an argument to a subroutine ignore this comment. not global

hkershaw-brown added a commit that referenced this pull request Sep 2, 2022
the caching was fixed in #368
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

redundant copying of get_close_caching data in filter_assim
4 participants