-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimized send - direct writes for large bitstype arrays #10073
Conversation
Current master:
With this PR:
|
@andreasnoack - could you please test this out? I would like to merge it soon. |
Yes. Sorry for not getting back to you on this one. I'll run the benchmarks tomorrow and report back. |
Her are the results and they confirm your findings. This is really great. First the plot of the time it takes to move an array of and next the relative timings to MPI with network transport (MPI-TCP). So instead of being four times MPI we'll now be two times MPI (for large arrays). |
optimized send - direct writes for large bitstype arrays
Do we know where we are losing the 2x now? If we can make this work as well as MPI-tcp, that may lead to reconsidering the darray design. Of course, we would also need efficient implementation of collectives at some basic level. |
Not yet, but we will keep chipping away. The IOBuffer used to aggregate messages till 100K in length can be replaced by a fixed length array - will remove unnecessary allocs for small requests too. Optimizing |
This is great! Could you briefly explain the reason to introduce |
We should add it to all existing AsyncStream types. In #6876 there was a discussion about having this support for all IO, adding a buffer to I figured it best to redo the PR to just solve the single problem for sending large arrays in our parallel infrastructure while the other issues get sorted out. I can submit a PR that will move the implementation to all AsyncStream types, while exporting 3 new functions - |
Whenever a request exceeds 100K in serialized length, it is directly written to the socket.
Supersedes #6876, partially address #9992