Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace current_peset_num with 'n' which should be the active pelist. #1246

Merged
merged 1 commit into from
Jun 16, 2023

Conversation

dkokron
Copy link
Contributor

@dkokron dkokron commented Jun 6, 2023

While chasing down the cause of the following error message, I realized that the check to ensure the transmitting rank is part of the peset is not using the peset returned by the get_peset call.

"FATAL from PE 36: MPP_BROADCAST: broadcasting from invalid PE."

I converted part of the UFS/HAFS application to use mpp_broadcast() instead of numerous point-to-point messages. Testing with a HAFS case revealed the above error message. I was able to make the code work by surrounding the mpp_broadcast() call with appropriate mpp_set_current_pelist() calls. A colleague said these mpp_set_current_pelist() calls should not be needed and provided a unit test to prove it.

Further investigation without the mpp_set_current_pelist() scoping calls revealed the issue addressed with the PR.

Fixes #1243

Testing was performance under Ubuntu-22.04 using the Intel 2023.1 suite of compilers and OpenMPI-4.1.5.
I have run the FMS internal check (make check) with and without the change in this PR. No change in the results.
I have also run "make distcheck". The logs config, make, check and distcheck logs are attached.

checkOPT.log
config.log
distcheckOPT.log
makeOPT.log

I also ran a HAFS case with a single nest on the NOAA 'acorn' system. This is the original case that started this whole investigation. That case ran fine using an FMS with the change contained in his PR.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • New check tests, if applicable, are included
  • make distcheck passes

@lharris4
Copy link
Contributor

lharris4 commented Jun 7, 2023

@laurenchilutti I know @bensonr is away for a couple of weeks. Is it possible someone else could review this change?

@laurenchilutti
Copy link
Contributor

@lharris4 I will find someone in the group to review this.

rem1776
rem1776 previously approved these changes Jun 7, 2023
@rem1776
Copy link
Contributor

rem1776 commented Jun 7, 2023

@lharris4 I will find someone in the group to review this.

@thomas-robinson will need to review this to merge since he's the other code owner for this file. I just approved as well, but doesn't make a difference merging-wise.

@lharris4
Copy link
Contributor

lharris4 commented Jun 7, 2023 via email

@thomas-robinson
Copy link
Member

We will discuss this today at our group meeting

@laurenchilutti
Copy link
Contributor

@dkokron how immediately would you like to see this change released in FMS? Would mid July be too late?

@dkokron
Copy link
Contributor Author

dkokron commented Jun 9, 2023 via email

thomas-robinson
thomas-robinson previously approved these changes Jun 9, 2023
@thomas-robinson
Copy link
Member

@dkokron We would prefer adding your single fix to a patch instead of using a workaround if the July timeline is too far in the future. We should be able to sync it up with the merge of NOAA-GFDL/GFDL_atmos_cubed_sphere#272

dkokron pushed a commit to dkokron/GFDL_atmos_cubed_sphere that referenced this pull request Jun 9, 2023
…epends on NOAA-GFDL/FMS#1246 to be functional.  More efficient 'if' test in fill_nested_grid_cpl()
dkokron pushed a commit to dkokron/fv3atm that referenced this pull request Jun 9, 2023
…epends on NOAA-GFDL/FMS#1246 to be functional.  More efficient 'if' test in fill_nested_grid_cpl()
dkokron pushed a commit to dkokron/ufs-weather-model that referenced this pull request Jun 9, 2023
…epends on NOAA-GFDL/FMS#1246 to be functional.  More efficient 'if' test in fill_nested_grid_cpl()
@rem1776 rem1776 changed the base branch from main to release/2023.01 June 12, 2023 17:42
@rem1776 rem1776 dismissed stale reviews from thomas-robinson and themself June 12, 2023 17:42

The base branch was changed.

@laurenchilutti
Copy link
Contributor

@dkokron Please let us know when you hear back from HAFS regarding the timeline in which you need this change. If you do require it earlier then our next planned release of FMS (which is approx. mid July) we would then need about 1 week to issue a patch for you.

@dkokron
Copy link
Contributor Author

dkokron commented Jun 13, 2023

Based on feedback from the HAFS developers, I think it's best to wait for the next scheduled FMS release.

@rem1776 rem1776 changed the base branch from release/2023.01 to main June 13, 2023 14:48
@thomas-robinson
Copy link
Member

Thanks @dkokron we are going to put it into our next round of testing.

bensonr pushed a commit to NOAA-GFDL/GFDL_atmos_cubed_sphere that referenced this pull request Jan 22, 2024
…rent (#272)

* Modifications to replace numerous p2p transfers in fill_nested_grid_cpl by a single MPI_Bcast

* Add reason for including ESMF in fv_control_init

* Add code to cleanup structures associated with Bcast_comm

* Use mpp_broadcast instead of MPI_Bcast at the request of NOAA-GFDL

* Get rid of mpp pelist scoping calls in fill_nested_grid_cpl().  This depends on NOAA-GFDL/FMS#1246 to be functional.  More efficient 'if' test in fill_nested_grid_cpl()

* Bcast_comm and sending_proc variables not needed anymore

* New member of the fv_atmos_type type called BcastMember allows more efficient determination of ranks involved in the mpp_broadcast call in fill_nested_grid_cpl routine

* Port to FMS-2023.02
---------

Co-authored-by: Dan Kokron <Daniel.Kokron@noaa.gov>
jkbk2004 pushed a commit to NOAA-EMC/fv3atm that referenced this pull request Jan 22, 2024
…rent (#743)

* Modifications to replace numerous p2p transfers in fill_nested_grid_cpl by a single MPI_Bcast

* Add reason for including ESMF in fv_control_init. Add code to cleanup structures associated with Bcast_comm

* Get rid of unneeded mpp_sync_self

* Use mpp_broadcast instead of MPI_Bcast at the request of NOAA-GFDL

* Get rid of mpp pelist scoping calls in fill_nested_grid_cpl().  This depends on NOAA-GFDL/FMS#1246 to be functional.  More efficient 'if' test in fill_nested_grid_cpl()

* Bcast_comm and sending_proc variables not needed anymore

* New member of the fv_atmos_type type called BcastMember allows more efficient determination of ranks involved in the mpp_broadcast call in fill_nested_grid_cpl routine

* Port to FMS-2023.02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong peset used in mpp_broadcast_
5 participants