mpi/finalized: don't hang if called during MPI_FINALIZE #5092

jsquyres · 2018-04-24T17:15:55Z

Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false".

This commit prevents MPI_FINALIZED from hanging while trying to obtain
the ompi_mpi_bootstrap_mutex lock (because ompi_mpi_finalize() has
already obtained it, and it's not safe for it to let the lock go).

Thanks to @AndrewGaspar for reporting the issue.

Signed-off-by: Jeff Squyres jsquyres@cisco.com

Note that this is somewhat pedantic because user attribute destruction callbacks will only ever be invoked when MPI_FINALIZED is false (i.e., [effectively] before MPI_FINALIZE).

Refs #5084.

jsquyres · 2018-04-24T17:16:12Z

@bosilca Can you review? This is somewhat tricky thread stuff.

bosilca · 2018-04-24T23:44:51Z

I was also looking for a solution to this problem, and the current status of the code is, as you noted, tricky and difficult. I would like to propose instead to ditch the mutex and rely on atomic operations to change and inquire for the state of the library.

I was wondering if there is a reason to use such a complicated scheme, instead of relying on atomic swaps to change the state of the library ?

jsquyres · 2018-04-25T15:05:40Z

@bosilca I think we didn't use atomics originally because there's 2 values -- initialized and finalized.

Perhaps they could be combined down to a single enum value...? That would allow for atomics to be used. Perhaps we could have the following values:

Not initialized
Initialization started
Initialization completed
Finalization started
Finalization past COMM_SELF destruction
- Not 100% sure this one is needed, but it might be...?
- EDIT: Yes, this one is needed -- MPI_FINALIZED needs to return FALSE prior to this state, and TRUE after this state.
Finalization completed

This would be a bit of a bigger change, but it should allow for atomics and less (no?) locking. This might become trickier in a sessions-based world, but that's probably still years off...?

jsquyres · 2018-04-30T20:40:45Z

@bosilca What do you think of the above proposal?

bosilca · 2018-05-01T18:58:40Z

Your proposal looks totally reasonable, matching what I was having in mind. I would like to help, let me know how.

jsquyres · 2018-05-23T02:53:07Z

@bosilca I worked up a (not quite-yet-finished/proofed/etc.) patch for this proposal.

However, there may be a fatal flaw in this scheme: there's now an atomic in OMPI_ERR_INIT_FINALIZE(), which is called in just about every MPI API function.

It's late, and I'm tired, so I'm not thinking this through well right now -- but I wonder if it's ok to not use an atomic here to read the value -- i.e., can we just look at ompi_mpi_state in this function without using an atomic function to obtain it?

I.e., everywhere else I obtain the value, I call opal_atomic_fetch_add_32(&ompi_mpi_state, 0) -- i.e., fetch it and add 0. Is that the Right way to atomically obtain the value? Or can I just read the value (if I don't want to change the value -- I just want to read it)?

jsquyres · 2018-05-23T17:13:29Z

In talking with @hjelmn, he tells me:

We normally just read the variable to read a value that was set atomically.
...but appropriate memory barriers must be in place.

This means that the OMPI_ERR_INIT_FINALIZE() macro could remove the opal_atomic_*() call, but we'd need to put in an RMB. I don't know if that's actually a net performance win or not.

Hrpmh.

jsquyres · 2018-05-23T17:45:08Z

Better idea: we only set the value of the global volatile ompi_mpi_state in 2 functions (ompi_mpi_init() and ompi_mpi_finalize()), so do the appropriate wmb's there. Then we can just read ompi_mpi_state everywhere else without an associated rmb. Problem solved!

PR updated to do this.

I still need to double check that I got all the conversions correct from the original code, but at least this major problem should be solved.

bosilca · 2018-05-23T18:09:32Z

Nice, the approach looks good. I will be reviewing the code shortly.

jsquyres · 2018-05-25T14:38:24Z

@bosilca Code cleaned up; ready for review.

bosilca · 2018-05-25T17:22:04Z

ompi/errhandler/errhandler_predefined.c

@@ -149,7 +149,9 @@ void ompi_mpi_errors_return_win_handler(struct ompi_win_t **win,

 static void out(char *str, char *arg)
 {
-    if (ompi_rte_initialized && !ompi_mpi_finalized) {
+    int32_t state = ompi_mpi_state;


no need for the temporary state variable.

Yeah, I was going for commonality of style in all places (even if there was only one use of ompi_mpi_state), assuming that the compiler would optimize it out. I can remove the temporary variables in those instances.

bosilca · 2018-05-25T17:23:13Z

ompi/mca/io/romio314/src/io_romio314_file_open.c

@@ -69,7 +70,8 @@ mca_io_romio314_file_close (ompi_file_t *fh)
       which we obviously can't do if we've started to MPI_Finalize).
       The user didn't close the file, so they should expect
       unexpected behavior. */
-    if (ompi_mpi_finalized) {
+    int32_t state = ompi_mpi_state;


similarly to above. As you only have a single use of this variable you can use the ompi_mpi_state directly.

bosilca · 2018-05-25T17:26:11Z

ompi/mpi/c/get_library_version.c

@@ -58,7 +58,9 @@ int MPI_Get_library_version(char *version, int *resultlen)
               (i.e., use a NULL communicator, which will end up at the
               default errhandler, which is abort). */

-            if (ompi_mpi_initialized && !ompi_mpi_finalized) {


This comment is not related to your patch, it is more general about the real meaning of the get_library_version function. In OMPI in particular why are we protecting this function ? All the information used here is statically known, we don't have to be between MPI_Init and MPI_Finalize.

We're not - see the comment above this that explains (short version: we're protecting the invocation of error handling if version or resultlen is NULL -- we have to handle the error differently if we're before INIT, afterINITbeforeFINALIZE, or after FINALIZE).

I missed that. thanks.

bosilca · 2018-05-25T17:28:25Z

ompi/mpi/c/initialized.c

@@ -74,8 +67,7 @@ int MPI_Initialized(int *flag)
        }
    }

-    *flag = ompi_mpi_initialized;
-    opal_mutex_unlock(&ompi_mpi_bootstrap_mutex);
+    *flag = (state >= OMPI_MPI_STATE_INIT_COMPLETED);


Assuming the param check is off, Is this really enough or should we check that the state is smaller than OMPI_MPI_STATE_FINALIZE_PAST_COMM_SELF_DESTRUCT?

The param check is only checking if they passed NULL for flag.

This function is supposed to return if MPI_INIT has completed or not. It says nothing about MPI_FINALIZE.

...am I missing something?

Nope, my bad.

bosilca · 2018-05-25T17:29:53Z

ompi/runtime/ompi_mpi_finalize.c

@@ -124,28 +124,30 @@ int ompi_mpi_finalize(void)
       ompi_mpi_finalize().  Hence, if we get it, then no other thread
       is inside the critical section (and we don't have to check the
       *_started bool variables). */
-    opal_mutex_lock(&ompi_mpi_bootstrap_mutex);
-    if (!ompi_mpi_initialized || ompi_mpi_finalized) {
+    opal_atomic_rmb();


From the discussion I gathered we don't need rmb ... If we really need rmb() then it deserves a comment explaining why.

@AndrewGaspar

Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be invoked during an attribute destruction callback (e.g., during the destruction of keyvals on MPI_COMM_SELF during the very beginning of MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false". Prior to this commit, we hung in FINALIZED if it were invoked during a COMM_SELF attribute destruction callback in FINALIZE. See open-mpi#5084. This commit converts the MPI_INITIALIZED / MPI_FINALIZED infrastructure to use a single enum (ompi_mpi_state, set atomically) to represent the state of MPI: - not initialized - init started - init completed - finalize started - finalize past COMM_SELF destruction - finalize completed The "finalize past COMM_SELF destruction" state is what allows us to return "false" from MPI_FINALIZED before COMM_SELF has been fully destroyed / all attribute callbacks have been invoked. Since this state is checked at nearly every MPI API call (to see if we're outside of the INIT/FINALIZE epoch), care was taken to use atomics to *set* the ompi_mpi_state value in ompi_mpi_init() and ompi_mpi_finalize(), but performance-critical code paths can simply read the variable without needing to use a slow call to an opal_atomic_*() function. Thanks to @AndrewGaspar for reporting the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

jsquyres · 2018-06-01T20:37:58Z

@bosilca All fixed. Added one additional commit to remove some (unrelated) dead code in ompi_mpi_finalize.c.

bosilca · 2018-06-01T22:51:44Z

ompi/mpi/c/initialized.c

@@ -74,8 +67,7 @@ int MPI_Initialized(int *flag)
        }
    }

-    *flag = ompi_mpi_initialized;
-    opal_mutex_unlock(&ompi_mpi_bootstrap_mutex);
+    *flag = (state >= OMPI_MPI_STATE_INIT_COMPLETED);


Nope, my bad.

(Note: this commit would have been squashed with the prior commit, but open-mpi#5092 was accidentally merged before the 2 commits were squashed together) Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 9b9cb5f)

sam6258 · 2018-09-04T19:34:57Z

@jsquyres It looks like some of the finalized comparisons are not consistent. Was this intended?

For example ompi_mpi_finalized was changed in the following files...

in hcoll_module.c there is:

if (ompi_mpi_state >= OMPI_MPI_STATE_FINALIZE_STARTED) {

But in io_romio314_file_open.c there is:

if (ompi_mpi_state >= OMPI_MPI_STATE_FINALIZE_PAST_COMM_SELF_DESTRUCT) { return OMPI_SUCCESS; }

jsquyres · 2018-09-04T21:49:37Z

I believe that in most places we replaced ompi_mpi_finalized with >= OMPI_MPI_STATE_FINALIZE_STARTED, but I have a (very dim) recollection that we used >= _PAST_COMM_SELF_DESTRUCT with ROMIO because ROMIO chains some shutdown stuff to the destruction of COMM_SELF.

sam6258 · 2018-09-04T22:57:55Z

Okay, just wanted to make sure these different comparisons were all intended, as it seemed like it could potentially be the source of a nasty race condition if that was not the intention.

jsquyres added the bug label Apr 24, 2018

jsquyres requested a review from bosilca April 24, 2018 17:15

jsquyres assigned bosilca Apr 24, 2018

bwbarrett added the Target: main label May 15, 2018

jsquyres added the ⚠️ WIP-DNM! label May 23, 2018

jsquyres force-pushed the pr/finalized-attribute-fix branch from f0420d0 to c9a4b29 Compare May 23, 2018 17:42

jsquyres force-pushed the pr/finalized-attribute-fix branch from c9a4b29 to 0052d7a Compare May 23, 2018 17:51

jsquyres mentioned this pull request May 25, 2018

Disable OSHMEM layer when no SPMLs available #5176

Merged

jsquyres force-pushed the pr/finalized-attribute-fix branch from 0052d7a to aa4238e Compare May 25, 2018 14:38

jsquyres removed the ⚠️ WIP-DNM! label May 25, 2018

jsquyres force-pushed the pr/finalized-attribute-fix branch 3 times, most recently from 817b815 to 7991483 Compare May 25, 2018 19:34

bosilca reviewed May 31, 2018

View reviewed changes

jsquyres force-pushed the pr/finalized-attribute-fix branch from 7991483 to 35438ae Compare June 1, 2018 20:36

ompi_mpi_finalize: remove some dead code

38ed70d

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

bosilca approved these changes Jun 1, 2018

View reviewed changes

jsquyres merged commit a1737ca into open-mpi:master Jun 3, 2018

jsquyres deleted the pr/finalized-attribute-fix branch June 3, 2018 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mpi/finalized: don't hang if called during MPI_FINALIZE #5092

mpi/finalized: don't hang if called during MPI_FINALIZE #5092

jsquyres commented Apr 24, 2018

jsquyres commented Apr 24, 2018

bosilca commented Apr 24, 2018

jsquyres commented Apr 25, 2018 •

edited

Loading

jsquyres commented Apr 30, 2018

bosilca commented May 1, 2018

jsquyres commented May 23, 2018

jsquyres commented May 23, 2018

jsquyres commented May 23, 2018

bosilca commented May 23, 2018

jsquyres commented May 25, 2018

bosilca May 25, 2018

jsquyres Jun 1, 2018

bosilca May 25, 2018

bosilca May 25, 2018

jsquyres Jun 1, 2018

bosilca Jun 1, 2018

bosilca May 25, 2018

jsquyres Jun 1, 2018

bosilca Jun 1, 2018

bosilca May 25, 2018

jsquyres commented Jun 1, 2018

bosilca Jun 1, 2018

sam6258 commented Sep 4, 2018

jsquyres commented Sep 4, 2018

sam6258 commented Sep 4, 2018

mpi/finalized: don't hang if called during MPI_FINALIZE #5092

mpi/finalized: don't hang if called during MPI_FINALIZE #5092

Conversation

jsquyres commented Apr 24, 2018

jsquyres commented Apr 24, 2018

bosilca commented Apr 24, 2018

jsquyres commented Apr 25, 2018 • edited Loading

jsquyres commented Apr 30, 2018

bosilca commented May 1, 2018

jsquyres commented May 23, 2018

jsquyres commented May 23, 2018

jsquyres commented May 23, 2018

bosilca commented May 23, 2018

jsquyres commented May 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsquyres commented Jun 1, 2018

Choose a reason for hiding this comment

sam6258 commented Sep 4, 2018

jsquyres commented Sep 4, 2018

sam6258 commented Sep 4, 2018

jsquyres commented Apr 25, 2018 •

edited

Loading