-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgi_acc support is broken on Titan #1006
Comments
Note that this has something to do with the "preamble" at the beginning of the env_mach_specific script. We may have to go back to the old style without the module purge etc. I don't know. Not my area of expertise, and will reassign this to @mrnorman unless someone else thinks that they can figure this out. Not something that I know how to complain to the OLCF about, since I do not know the reason behind the current logic. |
Just noticed that env_mach-specific.titan was working fine as recently as yesterday. I did go ahead and submit a bug report to the OLCF. Hopefully this can be corrected quickly. |
OLCF staff had no insight - were able to reproduce the problem in csh or tcsh, but it worked when starting from bash: "My shell is set to bash and I am able to run all the commands you listed, even if I start a csh shell. One of my colleagues has his shell set to tcsh and he sees the same issue." (env_mach_specific is a csh script). The did suggest "We generally recommend against running 'module purge' since it can sometimes |
On Edison and Cori, Tagging a related issue #1004. |
That was the style that we used to use on Titan. I thought that the purge was added to address a module loading problem. Unfortunately @mrnorman is not available to explain the history of the current logic. |
The new style was intorduced only 3 weeks ago, in PR #958 . |
@ndkeen , this should look familiar:
I'll try the new init routine. @mrnorman , since @ndkeen already worked through this at NERSC, you should touch base with Noel and decide on a common, robust, style. |
Update: /opt/modules/3.2.10.4/init/csh has its own problems. Guess that we will have to back off the module purge-style logic and reinstate the explicit module rms. |
Yuck. You could ask about why those env variables are not being wiped clean with a purge. |
@mrnorman is back from vacation now, and I think that he has a fix (correct, Matt?). The default modules were changed today, and another error crept in (hdf5 version). Updating netcdf and hdf to the latest solves the problem in my experiments. |
I never saw the error:
But the pgi not found error, and the netcdf version errors are fixed. I'm also updating to the latest defaults for libsci, mpich, and atp. Tests should be done by the end of today, and I'll update master then. |
Alright, I'm giving up module purge entirely. |
This was a csh/tcsh issue that OLCF were able to address at their end. FYI. |
by something that has changed on the OLCF side?
Building with pgi or pgi_acc (and I assume with intel as well) generates:
This does not prevent pgi builds, because we are using the default version of pgi. pgi_acc fails when it tries to switch to a different version.
@mrnorman is on vacation. This almost sounds like the issue that @ndkeen had to deal with when module loads were documented in two different lists? Noel - does this look familiar?
In any case, easily documented, and I will submit a report to help at OLCF. Just a note that this problem has been reported.
The text was updated successfully, but these errors were encountered: