Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMS.f19_f19.I1850CLM45CN test fails with seg-fault on Cascade with NAG compiler #276

Closed
singhbalwinder opened this issue Jul 20, 2015 · 7 comments
Assignees

Comments

@singhbalwinder
Copy link
Contributor

SMS.f19_f19.I1850CLM45CN test case is failing with the following error on Cascade with the NAG compiler:

[g1:mpi_rank_26][error_sighandler] Caught error: Segmentation fault (signal 11)
[g3:mpi_rank_52][error_sighandler] Caught error: Segmentation fault (signal 11)
srun: error: g3: task 52: Segmentation fault (core dumped)
srun: Terminating job step 1354215.0

The model crashes after 48 time steps (~1 day of run). The model crashes only when run with 64 tasks. With 1 and 8 tasks, the model runs fine to completion.

@rljacob
Copy link
Member

rljacob commented Nov 23, 2015

Can you reproduce this on blues?

@rljacob
Copy link
Member

rljacob commented Jan 7, 2016

This can be investigated on blues with with its NAG compiler.

@singhbalwinder
Copy link
Contributor Author

I am looking into it now.

On Thu, Jan 7, 2016 at 10:32 AM, Robert Jacob notifications@github.com
wrote:

This can be investigated on blues with with its NAG compiler.


Reply to this email directly or view it on GitHub
#276 (comment).

@yjinho
Copy link
Contributor

yjinho commented Jan 7, 2016

I haven't merged #597 yet. I thought I'd wait things are sorted out.

@rljacob
Copy link
Member

rljacob commented Jan 7, 2016

@singhbalwinder you don't have to. Its assigned to @bishtgautam.

@singhbalwinder
Copy link
Contributor Author

ok. I just started doing it and I am running into similar problems as I was
for the atm codes. NAG doesn't like the following:

#if def SOMETHING
module a_module
....
....
end module a_module
#endif

We have to change this to:

module a_module
#if def SOMETHING
....
....
#endif
end module a_module

Basically, NAG doesn't like to compile an empty file (which will be the
case if #if def is not true). I will let Gautam handle this issue.

On Thu, Jan 7, 2016 at 10:46 AM, Robert Jacob notifications@github.com
wrote:

@singhbalwinder https://github.com/singhbalwinder you don't have to.
Its assigned to @bishtgautam https://github.com/bishtgautam.


Reply to this email directly or view it on GitHub
#276 (comment).

rljacob added a commit that referenced this issue Jul 25, 2016
51f1282 Merge branch 'rljacob/acmepe/use-big-list' (PR #285)
a14a31f Merge branch 'master' of github.com:ESMCI/cime
d518834 minor change for hobart only
605e942 Merge pull request #283 from ESMCI/mira_walltime_fix
5ef8068 Merge pull request #284 from mvertens/newtests
c552fc2 Use single pe layouts file.  Add mpas entries
462810e replaced ww3 with stub wave in several tests that are known to fail
a8b9e7c need to return default walltime
7075ff6 Merge branch 'master' of github.com:CESM-Development/cime
49e3c29 Minor fix
11d61ce Merge pull request #279 from ESMCI/jgfouca/cime_bisect
b15cf46 Upgrade cime_bisect
873856a Merge pull request #278 from jedwards4b/minor_fixes_for_cesm_port
b990f48 add code to cp testreporter script for cesm
6ad54e9 Merge pull request #277 from fischer-ncar/testreporter_fix
be6d1b3 both compare and generate baseline_action_codes are allowed
e76ddc1 Fix for testreporter on hobart.
49fb888 fix issue with _caseroot when cloning
6d29a93 Update testreporter to work with updated ESMCI CIME
b7ed88f Updating to current ESMCI cime master
dbf9a47 Merge pull request #276 from ESMCI/jayeshkrishna/get_mira_working_for_acme
b5a2ce0 Get Mira working with acme
7adda2e updates for cesm allactive tests and compsets
fea102e Merge pull request #263 from ESMCI/sarich/new_acme_config_pes
7e9de96 update config_pes for change from cam5 to cam6
0170f5a rename new xml file to config_pesall.xml until we can sort out which pes descriptions belong here
3edb28f add script for converting the config_pes xml file
72a34bd Add NO_BASELINE_STATUS for BFAIL
83c5b68 Merge branch 'master' of github.com:ESMCI/cime
8d781cf Merge remote-tracking branch 'jedwards/edison_port_fixes' (PR #274)
559ee29 Merge pull request #273 from ESMCI/rljacob/cime/acme-batch
dcf27f6 reduce defaults for driver tests (cesm only) check for pylint earlier
3274c2a Add batch type for all acme machines in config_batch
118bf20 fix calls to read_xml
592ef76 Merge pull request #270 from jedwards4b/mira-workflow
aeac91a cannot write case files if error happens before case is created
45b0eb1 Minor fix
8cb87ff Merge pull request #272 from ESMCI/jgfouca/fix_some_sky_errors
fa09dcb Fix some problems on skybridge
bb6f277 Revert "get correct task_count for each job"
652f9c3 get correct task_count for each job
a9f1679 workflow now working on mira
c94e293 pull from esmci cime5.0.0
ec07a4f remove some pes configurations that muddied things up (BGP, BGQ, TEST)
5eb86bc add in latest changes from acme repo
862335a Add converted acme config_pes.xml file
58c3820 Merge pull request #472 from mvertens/remove_esmf_interfaces
f624721 Removed all code related to using the ESMF interfaces in driver_cpl directory
800e517 removed esmf interfaces

git-subtree-dir: cime
git-subtree-split: 51f1282
@bishtgautam
Copy link
Contributor

The #ifdef have been correctly defined for various modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants