-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix error handling for generalized requests #12392
base: main
Are you sure you want to change the base?
Conversation
dd44db1
to
8725699
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in responding; I had to go refresh myself on all these details.
This is unfortunately harping on some grey areas in the MPI standard.
- Why does Open MPI require 2 calls to actually free the resources? MPICH apparently does not.
- I think that this is an area of ambiguity in the standard. Which of Open MPI or MPICH is right? I'm not sure.
- For better or worse, Open MPI's current intended usage is like this:
// Or the equivalent in sessions MPI_Comm_set_errhandler(MPI_COMM_SELF, MPI_ERRORS_RETURN); int err = MPI_Wait(&req); if (MPI_SUCCESS != err && req_is_generalized_request) { // Try again, because we might be in the case // where the user-defined free function failed err = MPI_Request_free(&req); } if (MPI_SUCCESS != err) { // handle error }
- In the above code sample, why return an error from
MPI_REQUEST_FREE
if there was no error in freeing the internal MPI data structures?- Yeah, you're right: that's not intended. If
MPI_REQUEST_FREE
is able to free everything successfully, it should returnMPI_SUCCESS
. It shouldn't be affected by the previous error returned by the user's free function.
- Yeah, you're right: that's not intended. If
- In the above code sample, why doesn't
MPI_REQUEST_FREE
set the request toMPI_REQUEST_NULL
?- Yeah, you're right: that's not intended. If
MPI_REQUEST_FREE
is able to free everything successfully, it should set the INOUTrequest
parameter toMPI_REQUEST_NULL
.
- Yeah, you're right: that's not intended. If
I.e., a fix should probably ensure that when the user's free function returns an error:
- We set the user's free function pointer to NULL
- We return that error back up out the initial MPI function that was invoked (e.g.,
MPI_WAIT
, in the sample code above). Depending on the first MPI function that was invoked, this may mean returning the value from the function or setting the value on the appropriatestatus.MPI_ERROR
(if notMPI_STATUS_IGNORE
). - When
MPI_REQUEST_FREE
is invoked, free the rest of the internal OMPI data structures. Assuming that happens properly, returnMPI_SUCCESS
. This might be a little complicated if we set the initial error -- from the user's free function -- onstatus.MPI_ERROR
. I think in this case, we should resetstatus.MPI_ERROR
back toMPI_SUCCESS
. Or, at least, the non-SUCCESS value onstatus.MPI_ERROR
shouldn't cause us to go into error handling code in the internals ofMPI_REQUEST_FREE
.
Finally, I'll note that the intended behavior didn't make it into documentation anywhere. Sigh. We definitely need to add this to the docs somewhere -- the individual TEST/WAIT man pages are probably the right place to document this behavior.
@@ -532,7 +532,7 @@ static inline int ompi_request_complete(ompi_request_t* request, bool with_signa | |||
|
|||
ompi_wait_sync_t *tmp_sync = (ompi_wait_sync_t *) OPAL_ATOMIC_SWAP_PTR(&request->req_complete, | |||
REQUEST_COMPLETED); | |||
if( REQUEST_PENDING != tmp_sync ) { | |||
if (REQUEST_PENDING != tmp_sync && REQUEST_COMPLETED != tmp_sync) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that a core premise of ompi_request_complete()
is that it should only be invoked once for a given request. It wasn't intended to be safe to be called multiple times -- e.g., the first time, it actually completes the request, and subsequent times it effectively does nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ompi_request_complete()
called a second time should either fail cleanly returning error, or just happily continue as a non-op because the request already completed, but it should NEVER segfault as would happen currently if the user call MPI_Request_complete()
two times. The MPI standard does not say whether the routine can be called multiple times on the same request handle. With this fix, I'm just playing safe and allowing multiple calls to MPI_Request_complete()
. For reference, IIRC, MPICH errors not the second but the third time, this is something I was planning to report and/or eventually fix.
IMHO, it is probably easier to just allow multiple MPI_Request_complete()
calls on a request, just like I'm doing here. If you do not like such semantics and prefer to error, I believe the fix is trivial.
@@ -57,6 +57,10 @@ int MPI_Request_free(MPI_Request *request) | |||
} | |||
|
|||
rc = ompi_request_free(request); | |||
if (OPAL_UNLIKELY(OMPI_SUCCESS != rc && MPI_REQUEST_NULL != *request)) { | |||
(*request)->req_status.MPI_ERROR = rc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did ompi_request_free(request)
already set this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the generic user-level MPI_Request_free()
for all kinds of requests, not only generalized requests. I'm not so expert with the Open MPI implementation to 100% sure ompi_request_free()
will set the status error field on errors. Without that guarantee, my use of ompi_errhandler_request_invoke()
may not work as I intended. Note that ompi_errhandler_request_invoke()
has the side-effect of cleaning up the request and deallocating the object. Sorry, it's a bit complicated, but I had to work within the constraints of what I was given.
For example, look at ompi_comm_request_free()
... I don't see that routine setting the status error field. Therefore, I think being overzealous here is justified. IMHO, this line is a good one and should stay. Otherwise, all internal request types used in Open MPI (and future ones that may be implemented) should be audited and eventually fixed to make sure a failing free sets the error field.
@jsquyres I think I need to clarify what this PR is exactly doing, which IMHO is an improvement over previous behavior. When
Therefore, from (3), there is no need at all to call You can play with the following simple reproducer. I used a much convoluted example (using multiple calls to #include <assert.h>
#include <stdio.h>
#include <mpi.h>
static int query_fn (void *ctx, MPI_Status *s) { return MPI_SUCCESS; }
static int free_fn (void *ctx) { return MPI_ERR_OTHER; }
static int cancel_fn (void *ctx, int c) { return MPI_SUCCESS; }
int main(int argc, char *argv[])
{
int ierr;
MPI_Request request = MPI_REQUEST_NULL;
MPI_Init(&argc, &argv);
MPI_Comm_set_errhandler(MPI_COMM_SELF, MPI_ERRORS_RETURN);
MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
MPI_Grequest_start(query_fn, free_fn, cancel_fn, NULL, &request);
ierr = MPI_Grequest_complete(request);
assert (MPI_SUCCESS == ierr);
//int flag = 0;
//ierr = MPI_Test(&request, &flag, MPI_STATUS_IGNORE);
ierr = MPI_Wait(&request, MPI_STATUS_IGNORE);
//ierr = MPI_Request_free(&request);
assert (MPI_SUCCESS != ierr); // check that the call above actually failed
assert (MPI_REQUEST_NULL == request); // check that the request handle was deallocated.
MPI_Finalize();
return 0;
} |
8725699
to
846a3ba
Compare
Gentle requests to keep this PR moving forward. FYI, mpi4py has very good testing of error handling for generalized requests, albeit disabled waiting for the fixes from this PR. See the following link: if greq:
greq.Free() are no longer required, as the previous |
846a3ba
to
4012bed
Compare
4012bed
to
7273500
Compare
A second call to ompi_request_complete() segfaults. Make it mostly an non-op. Alternatively, the second invocation could generate an error, but let's play safe for now. Signed-off-by: Lisandro Dalcin <dalcinl@gmail.com>
If ompi_request_free() fails, use the ompi_errhandler_request_invoke() machinery used in the wait/test routines. This will have the side-effect of attempting to free the request a second time. Signed-off-by: Lisandro Dalcin <dalcinl@gmail.com>
Signed-off-by: Lisandro Dalcin <dalcinl@gmail.com>
Signed-off-by: Lisandro Dalcin <dalcinl@gmail.com>
d73fb7a
to
40a500d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this slipped by. One point where I'm not sure (and probably just needs a comment)
/** | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this change (since it's not relevant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? I'm fixing a inconsistency in the comment style respect to the other block comments in the same source file. Am I supposed to submit a separate PR for such trivial fixes? Or should these inconsistencies left alone to live forever?
{ | ||
int rc = OMPI_SUCCESS; | ||
|
||
if (REQUEST_COMPLETE(&greq->greq_base) && greq->greq_user_freed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we only invoke free if the user called MPI_Request_free
(i.e., from ompi_grequest_free
)? Shouldn't free be invoked on completion (i.e., from ompi_grequest_complete
) as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devreal Please read the large comment below this function, startin with MPI has some weird semantics with respect to generalized requests .... And no, completion should not trigger free by itself, completion is sort of flagging the thing as done, such that the progress engine running in another thread knows that the thing is completed. What makes things convoluted is that a user could Free
without a more usual Wait
or Test
call, therefore extra care is needed.
This is a followup of #11683 from @bosilca addressing outstanding issues related to error handling with generalized requests. Some of these issues where reported here #11681 (comment). I did my best to produce separate commits setting the stage for the fixes specific to generalized requests. I believe this PR is easier to review if you go commit by commit.
Throughout testing is available in this mpi4py branch to be merged once this PR lands in v5.0.x.
A test run of that mpi4py branch against this PR was submitted here.