Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ocean unused variable RediKappaData #5575

Conversation

mark-petersen
Copy link
Contributor

@mark-petersen mark-petersen commented Apr 4, 2023

The variable RediKappaData exists in the Registry file but is not used in the code. It appears that the gnu compiler removes some underlying information about the array in optimized mode, and then the MPI communication hangs when it tries to communicate the size of the array. When the array is removed the simulation no longer hangs on start-up.

The variable RediKappaData was meant for the option of entering a spatially-variable data field from input. Since it has not been used for three years, we should remove it. If this feature is needed in the future, it can be added in properly and tested at that time.

Fixes #5574
[BFB]

@mark-petersen
Copy link
Contributor Author

Tested this PR with the ocean stand-alone compass test ocean/global_ocean/EC30to60/PHC/performance_test. I can confirm that it does pass on chrysalis with gnu-optimized (ran on 36 cores). This set-up hangs on the head of master.

@xylar
Copy link
Contributor

xylar commented Apr 5, 2023

OMG, thanks for figuring this out @mark-petersen! I'll test this on all the configurations that were giving us trouble as soon as I can.

Copy link
Contributor

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful, @mark-petersen!

I can confirm that this fixes the issue I have been seeing with the pr suite on Chrysalis with Gnu compilers. With this fix, the suite completes, whereas without it, it hangs on the EC30to60 performance test.

I will also test on Perlmutter (when available) and Chicoma (tomorrow) but I think the fix on Chrysalis alone is enough to justify approving this change.

@xylar
Copy link
Contributor

xylar commented Apr 12, 2023

Unfortunately, this doesn't seem to completely solve the issues on Perlmutter or Chicoma. I'm still seeing PIO errors (but only in optimized runs). See, for example, on Perlmutter:

/pscratch/sd/x/xylar/compass_1.2/test_20230412/pr_rm_redi/ocean/global_ocean/EC30to60/PHC/performance_test/forward

and on Chicoma:

/lustre/scratch4/turquoise/xylar/compass_1.2/chicoma-cpu/test_20230412/pr_rm_redi/ocean/global_ocean/EC30to60/PHC/performance_test/forward

jonbob added a commit that referenced this pull request Apr 12, 2023
…5575)

Remove ocean unused variable RediKappaData

The variable RediKappaData exists in the Registry file but is not used
in the code. It appears that the gnu compiler removes some underlying
information about the array in optimized mode, and then the MPI
communication hangs when it tries to communicate the size of the array.
When the array is removed the simulation no longer hangs on start-up.

Fixes #5574
[BFB]
@jonbob
Copy link
Contributor

jonbob commented Apr 12, 2023

passes:

  • SMS_D_Ld3.T62_oQU120.CMPASO-IAF.chrysalis_intel
  • SMS_D_Ld1.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel.allactive-wcprod
  • ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel

merged to next

@jonbob jonbob merged commit 5918d07 into E3SM-Project:master Apr 13, 2023
xylar added a commit to xylar/compass that referenced this pull request Apr 18, 2023
This merge updates the E3SM-Project submodule from [c292bec000](https://github.com/E3SM-Project/E3SM/tree/c292bec000) to [4b3e611fee](https://github.com/E3SM-Project/E3SM/tree/4b3e611fee).

This update includes the following MPAS-Ocean and MPAS-Frameworks PRs (check mark indicates bit-for-bit with previous PR in the list):
- [ ]  (ocn) E3SM-Project/E3SM#5418
- [ ]  (ocn) E3SM-Project/E3SM#5447
- [ ]  (ocn) E3SM-Project/E3SM#5568
- [ ]  (ocn) E3SM-Project/E3SM#5583
- [ ]  (ocn) E3SM-Project/E3SM#5575
- [ ]  (ocn) E3SM-Project/E3SM#5600
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB bug fix PR mpas-ocean
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ocean hangs on start-up during MPI communication
3 participants