Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use util::generateGid() for MPI messages #203

Merged
merged 1 commit into from
Dec 27, 2021

Conversation

csegarragonz
Copy link
Collaborator

@csegarragonz csegarragonz commented Dec 22, 2021

The message ID in the MPIMessage protobuf objects is not necessary to guarantee execution integrity. It is just useful for logging and debugging purposes. Further, given that each message has also a sending and receiving rank, a per-rank counter suffices to uniquely identify them. As MPI ranks are single-threaded, we can get away with a thread local counter.

Note that, profiling in high contention experiments, showed a non-negligable amount of time spent in generateGid(). I am not sure what is causing it, but we definately don't need it in the MPI message, thus I remove it.

@csegarragonz csegarragonz self-assigned this Dec 22, 2021
Copy link
Collaborator

@Shillaker Shillaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock acquired in generateGid should only be acquired once per host (as it's just used to set a single static variable), so I'm very surprised that it would be a source of contention. How do you know it's the lock and not the other stuff that happens in that method? In what sort of workload is that the case?

If there is contention it's most likely that the code is broken somehow. Rather than just dodge the issue for MPI, it would be better to fix the underlying GID bottleneck if there is one. Perhaps we could try switching it to use call_once rather than the mutex-based setup we have at the moment, or use a static initialiser, e.g.

unsigned int generateGid()
{
    static size_t gidKeyHash = [](){
         std::string gidKey = faabric::util::randomString(GID_LEN);
         return std::hash<std::string>{}(gidKey);
    } ();
    
    // Existing method   
}

@csegarragonz csegarragonz merged commit 37f8702 into master Dec 27, 2021
@csegarragonz csegarragonz deleted the no-lock-mpimsgid branch December 27, 2021 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants