-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpi/finalized: don't hang if called during MPI_FINALIZE #5092
Conversation
@bosilca Can you review? This is somewhat tricky thread stuff. |
I was also looking for a solution to this problem, and the current status of the code is, as you noted, tricky and difficult. I would like to propose instead to ditch the mutex and rely on atomic operations to change and inquire for the state of the library. I was wondering if there is a reason to use such a complicated scheme, instead of relying on atomic swaps to change the state of the library ? |
@bosilca I think we didn't use atomics originally because there's 2 values -- initialized and finalized. Perhaps they could be combined down to a single enum value...? That would allow for atomics to be used. Perhaps we could have the following values:
This would be a bit of a bigger change, but it should allow for atomics and less (no?) locking. This might become trickier in a sessions-based world, but that's probably still years off...? |
@bosilca What do you think of the above proposal? |
Your proposal looks totally reasonable, matching what I was having in mind. I would like to help, let me know how. |
@bosilca I worked up a (not quite-yet-finished/proofed/etc.) patch for this proposal. However, there may be a fatal flaw in this scheme: there's now an atomic in It's late, and I'm tired, so I'm not thinking this through well right now -- but I wonder if it's ok to not use an atomic here to read the value -- i.e., can we just look at I.e., everywhere else I obtain the value, I call |
In talking with @hjelmn, he tells me:
This means that the Hrpmh. |
f0420d0
to
c9a4b29
Compare
Better idea: we only set the value of the global volatile PR updated to do this. I still need to double check that I got all the conversions correct from the original code, but at least this major problem should be solved. |
c9a4b29
to
0052d7a
Compare
Nice, the approach looks good. I will be reviewing the code shortly. |
0052d7a
to
aa4238e
Compare
@bosilca Code cleaned up; ready for review. |
817b815
to
7991483
Compare
@@ -149,7 +149,9 @@ void ompi_mpi_errors_return_win_handler(struct ompi_win_t **win, | |||
|
|||
static void out(char *str, char *arg) | |||
{ | |||
if (ompi_rte_initialized && !ompi_mpi_finalized) { | |||
int32_t state = ompi_mpi_state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for the temporary state
variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was going for commonality of style in all places (even if there was only one use of ompi_mpi_state
), assuming that the compiler would optimize it out. I can remove the temporary variables in those instances.
@@ -69,7 +70,8 @@ mca_io_romio314_file_close (ompi_file_t *fh) | |||
which we obviously can't do if we've started to MPI_Finalize). | |||
The user didn't close the file, so they should expect | |||
unexpected behavior. */ | |||
if (ompi_mpi_finalized) { | |||
int32_t state = ompi_mpi_state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly to above. As you only have a single use of this variable you can use the ompi_mpi_state
directly.
@@ -58,7 +58,9 @@ int MPI_Get_library_version(char *version, int *resultlen) | |||
(i.e., use a NULL communicator, which will end up at the | |||
default errhandler, which is abort). */ | |||
|
|||
if (ompi_mpi_initialized && !ompi_mpi_finalized) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not related to your patch, it is more general about the real meaning of the get_library_version function. In OMPI in particular why are we protecting this function ? All the information used here is statically known, we don't have to be between MPI_Init and MPI_Finalize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not - see the comment above this that explains (short version: we're protecting the invocation of error handling if version
or resultlen
is NULL -- we have to handle the error differently if we're before INIT, afterINITbeforeFINALIZE, or after FINALIZE).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that. thanks.
@@ -74,8 +67,7 @@ int MPI_Initialized(int *flag) | |||
} | |||
} | |||
|
|||
*flag = ompi_mpi_initialized; | |||
opal_mutex_unlock(&ompi_mpi_bootstrap_mutex); | |||
*flag = (state >= OMPI_MPI_STATE_INIT_COMPLETED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the param check is off, Is this really enough or should we check that the state is smaller than OMPI_MPI_STATE_FINALIZE_PAST_COMM_SELF_DESTRUCT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The param check is only checking if they passed NULL for flag
.
This function is supposed to return if MPI_INIT has completed or not. It says nothing about MPI_FINALIZE.
...am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, my bad.
ompi/runtime/ompi_mpi_finalize.c
Outdated
@@ -124,28 +124,30 @@ int ompi_mpi_finalize(void) | |||
ompi_mpi_finalize(). Hence, if we get it, then no other thread | |||
is inside the critical section (and we don't have to check the | |||
*_started bool variables). */ | |||
opal_mutex_lock(&ompi_mpi_bootstrap_mutex); | |||
if (!ompi_mpi_initialized || ompi_mpi_finalized) { | |||
opal_atomic_rmb(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the discussion I gathered we don't need rmb ... If we really need rmb() then it deserves a comment explaining why.
Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be invoked during an attribute destruction callback (e.g., during the destruction of keyvals on MPI_COMM_SELF during the very beginning of MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false". Prior to this commit, we hung in FINALIZED if it were invoked during a COMM_SELF attribute destruction callback in FINALIZE. See open-mpi#5084. This commit converts the MPI_INITIALIZED / MPI_FINALIZED infrastructure to use a single enum (ompi_mpi_state, set atomically) to represent the state of MPI: - not initialized - init started - init completed - finalize started - finalize past COMM_SELF destruction - finalize completed The "finalize past COMM_SELF destruction" state is what allows us to return "false" from MPI_FINALIZED before COMM_SELF has been fully destroyed / all attribute callbacks have been invoked. Since this state is checked at nearly every MPI API call (to see if we're outside of the INIT/FINALIZE epoch), care was taken to use atomics to *set* the ompi_mpi_state value in ompi_mpi_init() and ompi_mpi_finalize(), but performance-critical code paths can simply read the variable without needing to use a slow call to an opal_atomic_*() function. Thanks to @AndrewGaspar for reporting the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
7991483
to
35438ae
Compare
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
@bosilca All fixed. Added one additional commit to remove some (unrelated) dead code in |
@@ -74,8 +67,7 @@ int MPI_Initialized(int *flag) | |||
} | |||
} | |||
|
|||
*flag = ompi_mpi_initialized; | |||
opal_mutex_unlock(&ompi_mpi_bootstrap_mutex); | |||
*flag = (state >= OMPI_MPI_STATE_INIT_COMPLETED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, my bad.
(Note: this commit would have been squashed with the prior commit, but open-mpi#5092 was accidentally merged before the 2 commits were squashed together) Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 9b9cb5f)
(Note: this commit would have been squashed with the prior commit, but open-mpi#5092 was accidentally merged before the 2 commits were squashed together) Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 9b9cb5f)
(Note: this commit would have been squashed with the prior commit, but open-mpi#5092 was accidentally merged before the 2 commits were squashed together) Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 9b9cb5f)
(Note: this commit would have been squashed with the prior commit, but open-mpi#5092 was accidentally merged before the 2 commits were squashed together) Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 9b9cb5f)
(Note: this commit would have been squashed with the prior commit, but open-mpi#5092 was accidentally merged before the 2 commits were squashed together) Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 9b9cb5f)
(Note: this commit would have been squashed with the prior commit, but open-mpi#5092 was accidentally merged before the 2 commits were squashed together) Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 9b9cb5f)
(Note: this commit would have been squashed with the prior commit, but open-mpi#5092 was accidentally merged before the 2 commits were squashed together) Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 9b9cb5f)
@jsquyres It looks like some of the finalized comparisons are not consistent. Was this intended? For example ompi_mpi_finalized was changed in the following files... in hcoll_module.c there is:
But in io_romio314_file_open.c there is:
|
I believe that in most places we replaced |
Okay, just wanted to make sure these different comparisons were all intended, as it seemed like it could potentially be the source of a nasty race condition if that was not the intention. |
Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false".
This commit prevents MPI_FINALIZED from hanging while trying to obtain
the ompi_mpi_bootstrap_mutex lock (because ompi_mpi_finalize() has
already obtained it, and it's not safe for it to let the lock go).
Thanks to @AndrewGaspar for reporting the issue.
Signed-off-by: Jeff Squyres jsquyres@cisco.com
Note that this is somewhat pedantic because user attribute destruction callbacks will only ever be invoked when MPI_FINALIZED is false (i.e., [effectively] before MPI_FINALIZE).
Refs #5084.