Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimized send - direct writes for large bitstype arrays #10073

Merged
merged 1 commit into from
Feb 9, 2015
Merged

optimized send - direct writes for large bitstype arrays #10073

merged 1 commit into from
Feb 9, 2015

Conversation

amitmurthy
Copy link
Contributor

Whenever a request exceeds 100K in serialized length, it is directly written to the socket.

Supersedes #6876, partially address #9992

@amitmurthy
Copy link
Contributor Author

@andreasnoack :

Current master:

julia> a=ones(10^7);

julia> println(mean([@elapsed remotecall_fetch(2, (x)->x, a) for i in 1:10]))
0.1583344692

julia> s=[randstring() for x in 1:10^5];

julia> println(mean([@elapsed remotecall_fetch(2, (x)->x, s) for i in 1:10]))
0.2919245722

julia> println(@elapsed [remotecall_fetch(2, myid) for i in 1:10000])
0.618292734

With this PR:

julia> a=ones(10^7);

julia> println(mean([@elapsed remotecall_fetch(2, (x)->x, a) for i in 1:10]))
0.0611091225

julia> s=[randstring() for x in 1:10^5];

julia> println(mean([@elapsed remotecall_fetch(2, (x)->x, s) for i in 1:10]))
0.1801866141

julia> println(@elapsed [remotecall_fetch(2, myid) for i in 1:10000])
0.602078664

@ViralBShah ViralBShah added the parallelism Parallel or distributed computation label Feb 5, 2015
@amitmurthy
Copy link
Contributor Author

@andreasnoack - could you please test this out? I would like to merge it soon.
@vtjnash / @Keno - could you review the code if possible?

@andreasnoack
Copy link
Member

Yes. Sorry for not getting back to you on this one. I'll run the benchmarks tomorrow and report back.

@andreasnoack
Copy link
Member

Her are the results and they confirm your findings. This is really great. First the plot of the time it takes to move an array of Float64s to a worker against the size of the array.
amitpull1

and next the relative timings to MPI with network transport (MPI-TCP).
amitpull2

So instead of being four times MPI we'll now be two times MPI (for large arrays).
cc: @alanedelman

amitmurthy added a commit that referenced this pull request Feb 9, 2015
optimized send - direct writes for large bitstype arrays
@amitmurthy amitmurthy merged commit 6558327 into JuliaLang:master Feb 9, 2015
@ViralBShah
Copy link
Member

Do we know where we are losing the 2x now?

If we can make this work as well as MPI-tcp, that may lead to reconsidering the darray design. Of course, we would also need efficient implementation of collectives at some basic level.

@amitmurthy
Copy link
Contributor Author

Not yet, but we will keep chipping away. The IOBuffer used to aggregate messages till 100K in length can be replaced by a fixed length array - will remove unnecessary allocs for small requests too.

Optimizing @everywhere where we serialize the same request multiple times (and send it multiple times too) is another TODO.

@JeffBezanson
Copy link
Member

This is great! Could you briefly explain the reason to introduce BufferedAsyncStream rather than add this behavior to existing AsyncStream types?

@amitmurthy
Copy link
Contributor Author

We should add it to all existing AsyncStream types. In #6876 there was a discussion about having this support for all IO, adding a buffer to File and getting rid of IOStream entirely - which I am unfamiliar on how to go about.

I figured it best to redo the PR to just solve the single problem for sending large arrays in our parallel infrastructure while the other issues get sorted out.

I can submit a PR that will move the implementation to all AsyncStream types, while exporting 3 new functions - lock, unlock and set_lockable. The last one since by default we want AsyncStream's to retain existing behavior, i.e., no buffering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants