-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error propagation in free_fn callback for generalized requests #11681
Comments
@jsquyres Could you please reopen this issue? Thins still do not work quite as I would expect. Here you have another reproducer: #include <stdio.h>
#include <mpi.h>
static int query_fn (void *ctx, MPI_Status *s) { return MPI_SUCCESS; }
static int free_fn (void *ctx) { return MPI_ERR_OTHER; } // <-- RETURN WITH FAILURE !!!
static int cancel_fn (void *ctx, int c) { return MPI_SUCCESS; }
int main(int argc, char *argv[])
{
int ierr;
MPI_Request request;
MPI_Init(&argc, &argv);
MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
MPI_Grequest_start(query_fn, free_fn, cancel_fn, NULL, &request);
MPI_Grequest_complete(request);
{
ierr = MPI_Wait(&request, MPI_STATUS_IGNORE);
printf("After Wait() - error: %d, active: %d\n", ierr, request!=MPI_REQUEST_NULL);
}
if (MPI_REQUEST_NULL != request) {
ierr = MPI_Request_free(&request);
printf("After Free() - error: %d, active: %d\n", ierr, request!=MPI_REQUEST_NULL);
}
MPI_Finalize();
return 0;
} I'm getting the following output:
As you can see, I'm still getting an error after |
The grequest provided free function returned an error, why would you expect the request to be |
The free function is a user thing used to release user resources. If the free function somehow fails to release these user resources, there is nothing else to do with the user stuff. However, MPI could happily continue and release internal MPI stuff, and then, at the end, return an error code to signal the user free failure. If you do not like/want the above behavior, then at least don't make Please look at my reproducer above.
That is, the the wait call deallocates everything MPI-internal, then completes with error and sets Again, if that is not possible, what I'm asking for is to at least make the example print
that is, |
FWIW, the MPICH behavior is for EDIT: Same thing for |
This is with branch
v5.0.x
, but I guessmain
has the same issue (currently, I'm unable to buildmain
with internal pmix on Fedora 38, I'll submit another issue).Reproducer
Expected behavior
The reproducer should abort (via default error handler).
Actual behavior
The reproducer runs to completion with success and no output.
The error code from
free_fn
is not propagated toMPI_Wait
as required by the MPI standard.The text was updated successfully, but these errors were encountered: