Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the default-loaded module craype-hugepages2M which interferes with GNU on Cori #3129

Merged
merged 2 commits into from
Aug 19, 2019

Conversation

ndkeen
Copy link
Contributor

@ndkeen ndkeen commented Aug 15, 2019

After Cori software upgrade, one change was to load the craype-hugepages2M and this PR would simply add a command to explicitly remove that module for cori-knl and cori-haswell.

These are a few hugepages modules (at different sizes 2M,4M8M, etc) that can be loaded and require a rebuild to use. The use of hugepages for some apps can impact performance variability and reduce the cost of accessing memory in several MPI_Alltoall operations. However, initial tests when porting to Cori showed that there was no performance difference using these. I would just leave the module loaded (in favor of default machine settings), but for whatever reason, our GNU-built executables are failing unless the module is unloaded.

[bfb]
Fixes #3127

ndkeen added 2 commits August 14, 2019 14:34
…loaded after Cori upgrade as it

causes issues with GNU-built executables. In the past, the hugepages module had no
impact on our performance.

Also, remove the PERL env variable that was being temporarily used to point
to our own copy of the Switch PERL module as it should now be installed on Cori's login nodes.

[bfb]
@worleyph
Copy link
Contributor

hugepages is also causing memory failures (where they did not occur before) in F cases with very large meshes, e.g. ne1024 .

@ndkeen
Copy link
Contributor Author

ndkeen commented Aug 15, 2019

Are we sure we can point to the hugepages module as the reason for failure? I think Cray/NERSC would certainly not expect there to be any faults from this module -- doesn't mean it's not there.

@worleyph
Copy link
Contributor

This solved my build failure, and it is the only change I made.

ndkeen added a commit that referenced this pull request Aug 15, 2019
After Cori software upgrade, one NERSC change was to load the craype-hugepages2M and this PR would simply add a command to explicitly remove that module for cori-knl and cori-haswell.

These are a few hugepages modules (at different sizes 2M,4M8M, etc) that can be loaded and require a rebuild to use. The use of hugepages for some apps can impact performance variability and reduce the cost of accessing memory in several MPI_Alltoall operations. However, initial tests when porting to Cori showed that there was no performance difference using these. I would just leave the module loaded (in favor of default machine settings), but for whatever reason, our GNU-built executables are failing unless the module is unloaded.

[bfb]
Fixes #3127
@ndkeen
Copy link
Contributor Author

ndkeen commented Aug 15, 2019

merged to next

@ndkeen ndkeen merged commit 55d16bf into master Aug 19, 2019
@ndkeen ndkeen deleted the ndk/machinefiles/cori-nohugepages branch May 1, 2023 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problems with GNU-built E3SM executables on Cori after SW upgrade
3 participants