Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with PGI compiler update on Titan when targeting pgiacc #1620

Closed
worleyph opened this issue Jul 8, 2017 · 4 comments · Fixed by #1767
Closed

problem with PGI compiler update on Titan when targeting pgiacc #1620

worleyph opened this issue Jul 8, 2017 · 4 comments · Fixed by #1767

Comments

@worleyph
Copy link
Contributor

worleyph commented Jul 8, 2017

Just updated master, did

 ./create_newcase -case XXX -compset A_WCYCL2000 -res ne30_oEC -mach titan -compiler pgiacc -project cli115

then

 cd XXX
 ./case.setup

and got

 ERROR: module command /opt/modules/default/bin/modulecmd python switch pgi 17.5.lustre failed with message:
 pgi(3):ERROR:105: Unable to locate a modulefile for '17.5.lustre'

However, this works fine for '-compiler pgi', and I can't see any difference between the pgi and pgiacc logic in env_mach_specific.xml.

I'll keep poking, to see if I can determine where the problem is coming from.

@worleyph
Copy link
Contributor Author

worleyph commented Jul 8, 2017

Found it - a typo in env_mach_specific.xml for pgiacc:

currently:

  <command name="switch">pgi 17.5.lustre</command>

Should be

  <command name="switch">pgi pgi/17.5.lustre</command>

@worleyph
Copy link
Contributor Author

worleyph commented Jul 8, 2017

Note, for pgiacc also need to update Macros.make, changing

 LDFLAGS:= -time -Wl,--allow-multiple-definition -acc -ta=tesla,pin,cuda7.5,cc35

to

 LDFLAGS:= -time -Wl,--allow-multiple-definition -acc -ta=tesla,pinned,cuda7.5,cc35

and similarly for Depends.titan.pgiacc (changing 'pin' to 'pinned'). When fixing the env_mach_specific.xml typo, should also makes these changes (to config_compilers.xml and Depends.titan.pgiacc, both in cime/config/acme/machines/).

@minxu74
Copy link
Contributor

minxu74 commented Jul 9, 2017

@worleyph @mrnorman I noticed that typo for pgiacc after I merged it to master. I plan to fix it with the changes of compiler options in Macros and Depends files later.

jgfouca pushed a commit that referenced this issue Jul 14, 2017
Test suite: Ran the following smoke tests from the latest cesm alpha06 branch:
SMS_D.f09_g16_gl20.TG1.yellowstone_gnu
SMS_D.f09_g16_gl20.TG.yellowstone_gnu
SMS.f09_g16_gl20.TG1.yellowstone_gnu
SMS.f09_g16_gl20.TG.yellowstone_gnu
Test baseline: N/A
Test namelist changes: none
Test status: bit for bit

Fixes #1612

User interface changes?: none

Code review:
@mrnorman
Copy link
Contributor

mrnorman commented Sep 4, 2017

The internet is poor enough I will have to wait until next week. @minxu74 , hopefully your fix will work.

@minxu74 minxu74 closed this as completed in d6d6c3a Sep 8, 2017
minxu74 added a commit that referenced this issue Oct 27, 2017
Early on, it was determined that the PGI compiler required that
the CPU target be specified to be istanbul (instead of the actual
processor interlagos) in order for ACME (or CESM) to be
reproducible with respect to changes in process or thread counts.
Here this global specification of istanbul as the CPU target is
removed and only applied to the files that require it. Based
on experimentation, only CAM and MPASLI required
the modified CPU target, and we eventually identified a small
number of files within CAM for which this is required. We have not
yet examined these files to determine why these files require it.
We also have not yet looked at individual MPASLI files, and
instead continue to apply the istanbul cpu target for builds
of the entire GLC component.

Other changes include removing the dependence on the version of
pgi/17.5.0 installed in Dave Norton's directories, moving instead
to the now official OLCF installation. Also fixed a typo in the
module switch command for the PGI version used with PGIACC, and
changed the 'pin' flag to 'pinned' for PGIACC, as 'pin' is no
longer legal.

[Non-BFB] (on Titan when using PGI compiler)

Fixes #1620
Fixes #1610

(Note #1610 might need to be reopened, but this nominally addresses the primary issues.)

* worleyph/machines/Titan_pgi_updates:
  Add istanbul cpu target back for GLC component
  Restrict use of istanbul cpu target

Conflicts:
	cime/config/acme/machines/Depends.titan.pgiacc

Solved the conflict caused by changing the name of "solver_init_mod"
to "model_init_mod" in Depends.titan.pgiacc introduced by PR #1824
minxu74 added a commit that referenced this issue Oct 27, 2017
)

Early on, it was determined that the PGI compiler required that
the CPU target be specified to be istanbul (instead of the actual
processor interlagos) in order for ACME (or CESM) to be
reproducible with respect to changes in process or thread counts.
Here this global specification of istanbul as the CPU target is
removed and only applied to the files that require it. Based
on experimentation, only CAM and MPASLI required
the modified CPU target, and we eventually identified a small
number of files within CAM for which this is required. We have not
yet examined these files to determine why these files require it.
We also have not yet looked at individual MPASLI files, and
instead continue to apply the istanbul cpu target for builds
of the entire GLC component.

Other changes include removing the dependence on the version of
pgi/17.5.0 installed in Dave Norton's directories, moving instead
to the now official OLCF installation. Also fixed a typo in the
module switch command for the PGI version used with PGIACC, and
changed the 'pin' flag to 'pinned' for PGIACC, as 'pin' is no
longer legal.

[Non-BFB] (on Titan when using PGI compiler)

Fixes #1620
Fixes #1610

(Note #1610 might need to be reopened, but this nominally addresses the primary issues.)
jgfouca pushed a commit that referenced this issue Nov 10, 2017
)

Early on, it was determined that the PGI compiler required that
the CPU target be specified to be istanbul (instead of the actual
processor interlagos) in order for ACME (or CESM) to be
reproducible with respect to changes in process or thread counts.
Here this global specification of istanbul as the CPU target is
removed and only applied to the files that require it. Based
on experimentation, only CAM and MPASLI required
the modified CPU target, and we eventually identified a small
number of files within CAM for which this is required. We have not
yet examined these files to determine why these files require it.
We also have not yet looked at individual MPASLI files, and
instead continue to apply the istanbul cpu target for builds
of the entire GLC component.

Other changes include removing the dependence on the version of
pgi/17.5.0 installed in Dave Norton's directories, moving instead
to the now official OLCF installation. Also fixed a typo in the
module switch command for the PGI version used with PGIACC, and
changed the 'pin' flag to 'pinned' for PGIACC, as 'pin' is no
longer legal.

[Non-BFB] (on Titan when using PGI compiler)

Fixes #1620
Fixes #1610

(Note #1610 might need to be reopened, but this nominally addresses the primary issues.)
jgfouca pushed a commit that referenced this issue Feb 27, 2018
)

Early on, it was determined that the PGI compiler required that
the CPU target be specified to be istanbul (instead of the actual
processor interlagos) in order for ACME (or CESM) to be
reproducible with respect to changes in process or thread counts.
Here this global specification of istanbul as the CPU target is
removed and only applied to the files that require it. Based
on experimentation, only CAM and MPASLI required
the modified CPU target, and we eventually identified a small
number of files within CAM for which this is required. We have not
yet examined these files to determine why these files require it.
We also have not yet looked at individual MPASLI files, and
instead continue to apply the istanbul cpu target for builds
of the entire GLC component.

Other changes include removing the dependence on the version of
pgi/17.5.0 installed in Dave Norton's directories, moving instead
to the now official OLCF installation. Also fixed a typo in the
module switch command for the PGI version used with PGIACC, and
changed the 'pin' flag to 'pinned' for PGIACC, as 'pin' is no
longer legal.

[Non-BFB] (on Titan when using PGI compiler)

Fixes #1620
Fixes #1610

(Note #1610 might need to be reopened, but this nominally addresses the primary issues.)
jgfouca pushed a commit that referenced this issue Mar 14, 2018
)

Early on, it was determined that the PGI compiler required that
the CPU target be specified to be istanbul (instead of the actual
processor interlagos) in order for ACME (or CESM) to be
reproducible with respect to changes in process or thread counts.
Here this global specification of istanbul as the CPU target is
removed and only applied to the files that require it. Based
on experimentation, only CAM and MPASLI required
the modified CPU target, and we eventually identified a small
number of files within CAM for which this is required. We have not
yet examined these files to determine why these files require it.
We also have not yet looked at individual MPASLI files, and
instead continue to apply the istanbul cpu target for builds
of the entire GLC component.

Other changes include removing the dependence on the version of
pgi/17.5.0 installed in Dave Norton's directories, moving instead
to the now official OLCF installation. Also fixed a typo in the
module switch command for the PGI version used with PGIACC, and
changed the 'pin' flag to 'pinned' for PGIACC, as 'pin' is no
longer legal.

[Non-BFB] (on Titan when using PGI compiler)

Fixes #1620
Fixes #1610

(Note #1610 might need to be reopened, but this nominally addresses the primary issues.)
rljacob pushed a commit that referenced this issue May 6, 2021
)

Early on, it was determined that the PGI compiler required that
the CPU target be specified to be istanbul (instead of the actual
processor interlagos) in order for ACME (or CESM) to be
reproducible with respect to changes in process or thread counts.
Here this global specification of istanbul as the CPU target is
removed and only applied to the files that require it. Based
on experimentation, only CAM and MPASLI required
the modified CPU target, and we eventually identified a small
number of files within CAM for which this is required. We have not
yet examined these files to determine why these files require it.
We also have not yet looked at individual MPASLI files, and
instead continue to apply the istanbul cpu target for builds
of the entire GLC component.

Other changes include removing the dependence on the version of
pgi/17.5.0 installed in Dave Norton's directories, moving instead
to the now official OLCF installation. Also fixed a typo in the
module switch command for the PGI version used with PGIACC, and
changed the 'pin' flag to 'pinned' for PGIACC, as 'pin' is no
longer legal.

[Non-BFB] (on Titan when using PGI compiler)

Fixes #1620
Fixes #1610

(Note #1610 might need to be reopened, but this nominally addresses the primary issues.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants