-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Omega on GPUs with >1 node need MPICH_GPU_SUPPORT_ENABLED=1 #275
Comments
This problem was reported in E3SM-Project/Omega#196. Here is my exact test sequence. Here I am using polaris to set up the test case. details:
|
@cbegeman and @xylar, once I ran with the
Here I was watching the GPU usage with
and could see that the GPUs are in use for all of the above commands as follows: I need to understand why fewer GPUs are faster above, but for these convergence tests it appears that
works fine. I will need to do more research into these flag settings for the performance tests. |
@mark-petersen Glad to hear it. We can follow up elsewhere on whether we want |
I have added the missing environment variables to mache here: For this to propagate to Polaris, we will need a new mache release and then to update the mache version here in Polaris. That should happen pretty soon. |
Polaris runs with Omega on GPUs failed with >1 node. We need
MPICH_GPU_SUPPORT_ENABLED=1
in the environment.When I add this setting by hand, the test
ocean/planar/manufactured_solution/convergence_both/default
passes. If not, it fails at
forward_50km_75s
.The text was updated successfully, but these errors were encountered: