Skip to content
This repository has been archived by the owner on Oct 23, 2020. It is now read-only.

Threading optimization: bfb framework changes by Abhinav #1237

Merged

Conversation

mark-petersen
Copy link
Contributor

@mark-petersen mark-petersen commented Feb 9, 2017

This PR replaces #1151, and includes only bfb changes to framework. This includes:

  1. Implementation of threading into the mpas reconstruct routine.
  2. Changing MPI threading level from multiple to funneled.
  3. Reorganization in buffer pack and unpack in halo exchanges to minimize use of barriers.
  4. Implementation of threaded memory buffer initializations.

@mark-petersen
Copy link
Contributor Author

@akturner @mgduda @matthewhoffman Could each of you please do a test merge (or cherry pick) of these changes and make sure they pass your standard tests? It was bit-for-bit on the ocean model nightly regression suite using gnu.

@mark-petersen mark-petersen changed the title Threading optimization: bfb framework changes by Abhinov Threading optimization: bfb framework changes by Abhinav Feb 9, 2017
@akturner
Copy link
Contributor

akturner commented Feb 9, 2017

This passed the MPAS-Seaice tests for bfb.

@matthewhoffman
Copy link
Member

This passes MPASLI bfb.

Copy link
Contributor

@mgduda mgduda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look fine to me. I'm able to get bit-identical results with these changes, and there are apparently no bad interactions with the OpenMP strategy that we're using in MPAS-Atmosphere. I haven't found a measurable speedup with these changes, but this is likely because MPAS-Atmosphere is not using any scratch fields or the "new" halo exchange methods at present.

Copy link
Member

@matthewhoffman matthewhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any threaded code in the LI core, so the fact that this passes the LI regression suite is sufficient reason for me to approve this PR.

Copy link
Contributor

@akturner akturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests bfb with the sea ice core. Sea ice has no threading so otherwise doesnt affect us.

@mark-petersen
Copy link
Contributor Author

mark-petersen commented Mar 10, 2017

This commit DID pass the ACME PET test, which means a bit-for-bit comparison with and without threading. Horray!

process: I started with the ACME-Dev/master, it does pass PET. Added this single commit to the MPAS-O submodule, and it again passes PET

Here are the locations:

edison07> pwd
/scratch2/scratchdirs/mpeterse/acme_scratch/edison/PET_Ln9.T62_oQU240.GMPAS-NYF.edison_intel.20170310_092301/run
edison07> tail -n 9 *out

SUMMARY of cprnc:
 A total number of    329 fields were compared
          of which      0 had non-zero differences
               and      0 had differences in fill patterns
 A total number of      0 fields could not be analyzed
 A total number of      0 fields on file 1 were not found on file2.
  diff_test: the two files seem to be IDENTICAL

WAIT - I did not update all three cores. Rerunning test, we shall see...

mark-petersen added a commit to E3SM-Project/E3SM that referenced this pull request Mar 24, 2017
This only updates to:
Threading optimization: bfb framework changes by Abhinav #1237
MPAS-Dev/MPAS#1237
@mark-petersen mark-petersen force-pushed the framework/openMP_Abhinov_bfb branch from 8019269 to 62e69d7 Compare April 5, 2017 23:03
@mark-petersen
Copy link
Contributor Author

@@ -265,25 +266,26 @@ subroutine mpas_reconstruct_2d(meshPool, u, uReconstructX, uReconstructY, uRecon
enddo ! iCell
!$omp end do

call mpas_threading_barrier()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipwjones From my git diff command, it looks like mpas_vector_reconstruction.F was the only file you changed in #1285. Specifically, this threading barrier is the only substantive line, others are just spacing. Please check that these changes look like your intended changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct @mark-petersen . Sorry for the cosmetic changes - guess I'm a little obsessive-compulsive and can't resist lining some things up. Looking at the other ocean-related bfb mods, I suspect your previous non-passing of PET ACME tests were possibly related to your branch not having some of the updated mods from the previous non-deterministic fixes. At least one of those mods was in that commit. Anyway, have moved on to testing the mods for the ocean bfb threading PR next.

mark-petersen added a commit that referenced this pull request Apr 6, 2017
… ocean/develop

With the addition of a new threaded reconstruction routine in #1237,
openMP directives surrounding calls to this routine in the ocean core
needed to be removed for consistency.
@mark-petersen
Copy link
Contributor Author

@philipwjones just to verify, this updated version, along with #1286, passed the ACME PET test on all the machines? Are you done testing?

@philipwjones
Copy link
Contributor

Still trying to get a working configuration on Titan - CIME5.2 still has a few bugs. Hope to have something running pretty soon. I'm sufficiently confident though, so if you need to go ahead with this, I think it'll be fine.

@mark-petersen
Copy link
Contributor Author

We group our MPAS updates into ACME, which makes rolling back a single PR difficult. I will wait until you've completed the Titan tests. That has the additional advantage that if you have some minor updates to this PR required by Titan, you can simply push it to my fork as an extra commit.

@philipwjones
Copy link
Contributor

@mark-petersen The previous commits in MPAS (framework + consistency mods) pass the PET tests on Titan so you can go ahead with those. However, one of the routines in the second round of ocean b4b mods is failing. Have it narrowed down to somewhere in the barotropic solve mods, so will hopefully have that second round of threading improvements ready to go early next week.

@mark-petersen
Copy link
Contributor Author

Great, thanks a lot @philipwjones. The log module should be merged onto ACME Monday. Then I will rebase and do a few more quick tests. Then I will rebase the other two PRs as well. Anything fixes you find before then can just be pushed to my branches with an additional commit.

This adds an mpas_threading_barrier.
Originally added in PR #1285
@mark-petersen mark-petersen force-pushed the framework/openMP_Abhinov_bfb branch from 62e69d7 to 28487f7 Compare May 9, 2017 13:07
@mark-petersen
Copy link
Contributor Author

Rebased and tested again, due to substantial changes with new log module since this was created.

@mark-petersen mark-petersen merged commit 28487f7 into MPAS-Dev:develop May 9, 2017
@mark-petersen mark-petersen deleted the framework/openMP_Abhinov_bfb branch May 9, 2017 13:38
@jonbob
Copy link
Contributor

jonbob commented Aug 7, 2017

merged into ACME via PR #1630

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants