Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drain MPI queues before cleanup #100

Merged
merged 1 commit into from
May 28, 2021
Merged

Drain MPI queues before cleanup #100

merged 1 commit into from
May 28, 2021

Conversation

csegarragonz
Copy link
Collaborator

@csegarragonz csegarragonz commented May 28, 2021

The MpiWorld::destroy() method was working alright until tested from faasm in faasm/faasm#420. There, I add a barrier before calling destroy(). However, our barrier implementation is not fully blocking (ironic).

The way barrier works is:

  1. Recv in rank 0 a message from all other ranks.
  2. Broadcast from rank 0 when received messages from all ranks.

Before the broadcast is issued, all non-zero ranks are blocked on a recv, however rank 0 does a send (which does not block) and exits the barrier before the other ranks have actually received the barrier. Thus, if this takes a long time, and we clear all the in memory queues before that, undefined behaviour happens.

@csegarragonz csegarragonz self-assigned this May 28, 2021
@csegarragonz csegarragonz added bug Something isn't working mpi Related to the MPI implementation labels May 28, 2021
@csegarragonz csegarragonz requested a review from Shillaker May 28, 2021 10:00
@csegarragonz csegarragonz merged commit c5b87ef into master May 28, 2021
@csegarragonz csegarragonz deleted the mpi-world-cleanup branch May 28, 2021 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mpi Related to the MPI implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants