Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi/finalized: don't hang if called during MPI_FINALIZE #5092

Merged
merged 2 commits into from
Jun 3, 2018

Conversation

jsquyres
Copy link
Member

Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false".

This commit prevents MPI_FINALIZED from hanging while trying to obtain
the ompi_mpi_bootstrap_mutex lock (because ompi_mpi_finalize() has
already obtained it, and it's not safe for it to let the lock go).

Thanks to @AndrewGaspar for reporting the issue.

Signed-off-by: Jeff Squyres jsquyres@cisco.com

Note that this is somewhat pedantic because user attribute destruction callbacks will only ever be invoked when MPI_FINALIZED is false (i.e., [effectively] before MPI_FINALIZE).

Refs #5084.

@jsquyres jsquyres added the bug label Apr 24, 2018
@jsquyres jsquyres requested a review from bosilca April 24, 2018 17:15
@jsquyres
Copy link
Member Author

@bosilca Can you review? This is somewhat tricky thread stuff.

@bosilca
Copy link
Member

bosilca commented Apr 24, 2018

I was also looking for a solution to this problem, and the current status of the code is, as you noted, tricky and difficult. I would like to propose instead to ditch the mutex and rely on atomic operations to change and inquire for the state of the library.

I was wondering if there is a reason to use such a complicated scheme, instead of relying on atomic swaps to change the state of the library ?

@jsquyres
Copy link
Member Author

jsquyres commented Apr 25, 2018

@bosilca I think we didn't use atomics originally because there's 2 values -- initialized and finalized.

Perhaps they could be combined down to a single enum value...? That would allow for atomics to be used. Perhaps we could have the following values:

  1. Not initialized
  2. Initialization started
  3. Initialization completed
  4. Finalization started
  5. Finalization past COMM_SELF destruction
    • Not 100% sure this one is needed, but it might be...?
    • EDIT: Yes, this one is needed -- MPI_FINALIZED needs to return FALSE prior to this state, and TRUE after this state.
  6. Finalization completed

This would be a bit of a bigger change, but it should allow for atomics and less (no?) locking. This might become trickier in a sessions-based world, but that's probably still years off...?

@jsquyres
Copy link
Member Author

@bosilca What do you think of the above proposal?

@bosilca
Copy link
Member

bosilca commented May 1, 2018

Your proposal looks totally reasonable, matching what I was having in mind. I would like to help, let me know how.

@jsquyres
Copy link
Member Author

@bosilca I worked up a (not quite-yet-finished/proofed/etc.) patch for this proposal.

However, there may be a fatal flaw in this scheme: there's now an atomic in OMPI_ERR_INIT_FINALIZE(), which is called in just about every MPI API function.

It's late, and I'm tired, so I'm not thinking this through well right now -- but I wonder if it's ok to not use an atomic here to read the value -- i.e., can we just look at ompi_mpi_state in this function without using an atomic function to obtain it?

I.e., everywhere else I obtain the value, I call opal_atomic_fetch_add_32(&ompi_mpi_state, 0) -- i.e., fetch it and add 0. Is that the Right way to atomically obtain the value? Or can I just read the value (if I don't want to change the value -- I just want to read it)?

@jsquyres
Copy link
Member Author

In talking with @hjelmn, he tells me:

  • We normally just read the variable to read a value that was set atomically.
  • ...but appropriate memory barriers must be in place.

This means that the OMPI_ERR_INIT_FINALIZE() macro could remove the opal_atomic_*() call, but we'd need to put in an RMB. I don't know if that's actually a net performance win or not.

Hrpmh.

@jsquyres jsquyres force-pushed the pr/finalized-attribute-fix branch from f0420d0 to c9a4b29 Compare May 23, 2018 17:42
@jsquyres
Copy link
Member Author

Better idea: we only set the value of the global volatile ompi_mpi_state in 2 functions (ompi_mpi_init() and ompi_mpi_finalize()), so do the appropriate wmb's there. Then we can just read ompi_mpi_state everywhere else without an associated rmb. Problem solved!

PR updated to do this.

I still need to double check that I got all the conversions correct from the original code, but at least this major problem should be solved.

@jsquyres jsquyres force-pushed the pr/finalized-attribute-fix branch from c9a4b29 to 0052d7a Compare May 23, 2018 17:51
@bosilca
Copy link
Member

bosilca commented May 23, 2018

Nice, the approach looks good. I will be reviewing the code shortly.

@jsquyres
Copy link
Member Author

@bosilca Code cleaned up; ready for review.

@jsquyres jsquyres force-pushed the pr/finalized-attribute-fix branch 3 times, most recently from 817b815 to 7991483 Compare May 25, 2018 19:34
@@ -149,7 +149,9 @@ void ompi_mpi_errors_return_win_handler(struct ompi_win_t **win,

static void out(char *str, char *arg)
{
if (ompi_rte_initialized && !ompi_mpi_finalized) {
int32_t state = ompi_mpi_state;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the temporary state variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was going for commonality of style in all places (even if there was only one use of ompi_mpi_state), assuming that the compiler would optimize it out. I can remove the temporary variables in those instances.

@@ -69,7 +70,8 @@ mca_io_romio314_file_close (ompi_file_t *fh)
which we obviously can't do if we've started to MPI_Finalize).
The user didn't close the file, so they should expect
unexpected behavior. */
if (ompi_mpi_finalized) {
int32_t state = ompi_mpi_state;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly to above. As you only have a single use of this variable you can use the ompi_mpi_state directly.

@@ -58,7 +58,9 @@ int MPI_Get_library_version(char *version, int *resultlen)
(i.e., use a NULL communicator, which will end up at the
default errhandler, which is abort). */

if (ompi_mpi_initialized && !ompi_mpi_finalized) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not related to your patch, it is more general about the real meaning of the get_library_version function. In OMPI in particular why are we protecting this function ? All the information used here is statically known, we don't have to be between MPI_Init and MPI_Finalize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not - see the comment above this that explains (short version: we're protecting the invocation of error handling if version or resultlen is NULL -- we have to handle the error differently if we're before INIT, afterINITbeforeFINALIZE, or after FINALIZE).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that. thanks.

@@ -74,8 +67,7 @@ int MPI_Initialized(int *flag)
}
}

*flag = ompi_mpi_initialized;
opal_mutex_unlock(&ompi_mpi_bootstrap_mutex);
*flag = (state >= OMPI_MPI_STATE_INIT_COMPLETED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the param check is off, Is this really enough or should we check that the state is smaller than OMPI_MPI_STATE_FINALIZE_PAST_COMM_SELF_DESTRUCT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The param check is only checking if they passed NULL for flag.

This function is supposed to return if MPI_INIT has completed or not. It says nothing about MPI_FINALIZE.

...am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, my bad.

@@ -124,28 +124,30 @@ int ompi_mpi_finalize(void)
ompi_mpi_finalize(). Hence, if we get it, then no other thread
is inside the critical section (and we don't have to check the
*_started bool variables). */
opal_mutex_lock(&ompi_mpi_bootstrap_mutex);
if (!ompi_mpi_initialized || ompi_mpi_finalized) {
opal_atomic_rmb();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the discussion I gathered we don't need rmb ... If we really need rmb() then it deserves a comment explaining why.

Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE).  In such cases, MPI_FINALIZED must return "false".

Prior to this commit, we hung in FINALIZED if it were invoked during
a COMM_SELF attribute destruction callback in FINALIZE.  See
open-mpi#5084.

This commit converts the MPI_INITIALIZED / MPI_FINALIZED
infrastructure to use a single enum (ompi_mpi_state, set atomically)
to represent the state of MPI:

- not initialized
- init started
- init completed
- finalize started
- finalize past COMM_SELF destruction
- finalize completed

The "finalize past COMM_SELF destruction" state is what allows us to
return "false" from MPI_FINALIZED before COMM_SELF has been fully
destroyed / all attribute callbacks have been invoked.

Since this state is checked at nearly every MPI API call (to see if
we're outside of the INIT/FINALIZE epoch), care was taken to use
atomics to *set* the ompi_mpi_state value in ompi_mpi_init() and
ompi_mpi_finalize(), but performance-critical code paths can simply
read the variable without needing to use a slow call to an
opal_atomic_*() function.

Thanks to @AndrewGaspar for reporting the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
@jsquyres jsquyres force-pushed the pr/finalized-attribute-fix branch from 7991483 to 35438ae Compare June 1, 2018 20:36
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
@jsquyres
Copy link
Member Author

jsquyres commented Jun 1, 2018

@bosilca All fixed. Added one additional commit to remove some (unrelated) dead code in ompi_mpi_finalize.c.

@@ -74,8 +67,7 @@ int MPI_Initialized(int *flag)
}
}

*flag = ompi_mpi_initialized;
opal_mutex_unlock(&ompi_mpi_bootstrap_mutex);
*flag = (state >= OMPI_MPI_STATE_INIT_COMPLETED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, my bad.

@jsquyres jsquyres merged commit a1737ca into open-mpi:master Jun 3, 2018
@jsquyres jsquyres deleted the pr/finalized-attribute-fix branch June 3, 2018 17:10
jsquyres added a commit to jsquyres/ompi that referenced this pull request Jun 7, 2018
(Note: this commit would have been squashed with the prior commit, but
open-mpi#5092 was accidentally merged
before the 2 commits were squashed together)

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 9b9cb5f)
jsquyres added a commit to jsquyres/ompi that referenced this pull request Jun 7, 2018
(Note: this commit would have been squashed with the prior commit, but
open-mpi#5092 was accidentally merged
before the 2 commits were squashed together)

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 9b9cb5f)
jsquyres added a commit to jsquyres/ompi that referenced this pull request Jun 7, 2018
(Note: this commit would have been squashed with the prior commit, but
open-mpi#5092 was accidentally merged
before the 2 commits were squashed together)

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 9b9cb5f)
jsquyres added a commit to jsquyres/ompi that referenced this pull request Jun 7, 2018
(Note: this commit would have been squashed with the prior commit, but
open-mpi#5092 was accidentally merged
before the 2 commits were squashed together)

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 9b9cb5f)
jsquyres added a commit to jsquyres/ompi that referenced this pull request Jun 7, 2018
(Note: this commit would have been squashed with the prior commit, but
open-mpi#5092 was accidentally merged
before the 2 commits were squashed together)

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 9b9cb5f)
jsquyres added a commit to jsquyres/ompi that referenced this pull request Jun 7, 2018
(Note: this commit would have been squashed with the prior commit, but
open-mpi#5092 was accidentally merged
before the 2 commits were squashed together)

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 9b9cb5f)
jsquyres added a commit to jsquyres/ompi that referenced this pull request Jun 7, 2018
(Note: this commit would have been squashed with the prior commit, but
open-mpi#5092 was accidentally merged
before the 2 commits were squashed together)

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 9b9cb5f)
@sam6258
Copy link
Contributor

sam6258 commented Sep 4, 2018

@jsquyres It looks like some of the finalized comparisons are not consistent. Was this intended?

For example ompi_mpi_finalized was changed in the following files...

in hcoll_module.c there is:

if (ompi_mpi_state >= OMPI_MPI_STATE_FINALIZE_STARTED) {

But in io_romio314_file_open.c there is:

if (ompi_mpi_state >= OMPI_MPI_STATE_FINALIZE_PAST_COMM_SELF_DESTRUCT) { return OMPI_SUCCESS; }

@jsquyres
Copy link
Member Author

jsquyres commented Sep 4, 2018

I believe that in most places we replaced ompi_mpi_finalized with >= OMPI_MPI_STATE_FINALIZE_STARTED, but I have a (very dim) recollection that we used >= _PAST_COMM_SELF_DESTRUCT with ROMIO because ROMIO chains some shutdown stuff to the destruction of COMM_SELF.

@sam6258
Copy link
Contributor

sam6258 commented Sep 4, 2018

Okay, just wanted to make sure these different comparisons were all intended, as it seemed like it could potentially be the source of a nasty race condition if that was not the intention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants