-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channel and RemoteChannel #8507
Conversation
I like this a lot. I wonder if |
My preference would be for the latter, i.e., |
👍 Shouldn't |
The constructors available are
The channel type defines the type of the backing vector, and |
Why not make |
OK. |
Nice! I have pretty much the same implmementation of Channels here: https://github.com/shashi/CSP.jl/blob/master/src/CSP.jl operators on it are Very similar to Channels in Golang or Clojure's core.async My thinking was that Channels should be decoupled from Transports, i.e. instead of having a RemoteChannel type, it would be nice to just have Channels and be able to pipe channels in and out of different transports (e.g. a TCP transport or a ZMQ socket). A transport is just any object which supports This is nice because then you get the power of map, reduce, filter etc without having to rewrite it for each kind of channel. Connecting things on different machines becomes as easy as it should be, I think. |
RemoteChannels are for inter-process (worker) communication while Channels are for inter-task communication. The Channel implementation uses a circular buffer and grows on demand upto its maximum size. RemoteChannels are in effect just a remote reference to a single, common Channel. Serializing/deserializing just sends the reference coordinates (where, whence, id) between processes. So, "pipe channels in and out of different transports" does not make sense, unless you mean piping references to the channel - the channel data should only ever be on one process. In future, we could make it possible to seamlessly "send" regular |
Decoupling channels from transports as we go forward does seem like a good idea. One could then even use an MPI transport, in addition to tcp/zmq etc. |
I assume that N tasks (or processes) could each be waiting on the same channel object
Each time a request is added to c, only one of the take! calls will return. With threading, we would require appropriate locking in the implementation. |
Oops, I deleted my comment by mistake. That makes sense! Thanks. Edit: For the record, previous comment was: I don't really understand what sending Channels across processes entails. Why use a circular buffer when push!, shift! and length feel more natural and can cut down on a lot of bookkeeping? I am confused about fetch and take!... Say I have a web server with N tasks (or eventually, say with RemoteChannels, N processes or threads) listening for requests on a Channel{Request} object # a worker
c::Channel{Request}
@async while true
serve(take!(c))
end Now when a Request is put on the channel, all the N tasks get notified. They will all try to shift messages off the queue until there is nothing left on it, correct? Will this work safely if @async spawned a new thread instead of a task? It would be awesome if we get guaranteed request-stealing kind of load balancing for free here. |
In order to have a single Channel type across tasks and processes, how about the following:
Users will always instantiate channels with the The act of sending a
|
This sounds good! I guess transports are a different story unrelated to this PR as they should be. 👍 And ChannelRef seems like the bookkeeping necessary along with a transport which can do Would it be really hard to get rid of |
Yes, just a boolean type parameter defining the type of reference being held by As for We could also parametrize on size, and have a different implementation for channels of size 1. And implement |
I notice I don't think it's any better to have |
Yes. That is the third TODO mentioned in PR description above. |
2ef98c5
to
0388647
Compare
Have made changes as discussed above. A few minor things :
I was just wondering whether, in order to keep
This is not the case, as only references to the shared array are sent over, but the name suggests otherwise. In order to avoid a similar confusion here, I am wondering whether to change
|
Bump @JeffBezanson . I will rebase and resolve conflicts if this is OK. |
Bump. It would be good to merge to unblock follow-on work. |
Bump again. @JeffBezanson @StefanKarpinski Should we get this in and tweak it later? |
Lets hold off for a few days. I am working on abstracting out the transport part from More like a That will bring up the issue of what the correct interface for a Channel should be - a Let me get that PR out and we can then continue this discussion. |
#9046 introduces a new type This PR is ready for review. |
This has trailing whitespace so fails travis, and |
tonotify = [] | ||
# delete this worker from our RemoteChannel client sets | ||
ids = {} | ||
tonotify = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprecated syntax
Fixed. Thanks. |
Travis build errors seem unrelated. Win32 went through. |
Isn't this now in a package? Can it be closed? |
Similar functionality is available in https://github.com/JuliaParallel/MessageUtils.jl However, I do feel that we should have channel functionality in Base itself. Channels are a very useful feature for inter-task communications where producers / consumers do not have to co-ordinate with each other. And it changes RemoteRef from “shared queue of length 1” to "shared queue of an arbitrary length". I can rework this PR against the current master. One change I would like is to retain the name |
Is this something we should be targeting for 0.4? |
I think this is a good idea, but I worry about how multithreading is going to affect the API. Go has a channel abstraction that is only for talking between threads in the same process. This is similar, but for talking between tasks either on the same process or across processes (potentially across a network). Do we think this model will survive as is once we have threading? |
Yes. Will update this PR. |
Actually Channels are optimized for inter-task communication. While RemoteChannel is more like a handle to a Channel on a different process. In Erlang, every task has its own message queue, and it is trivial to send a message to a remote task's message queue. Architectures that rely on messaging between lots of tasks will have a need for channels. Multi-threading will reduce the need for shared arrays, but for message driven architectures, where users do not want to deal with locking/unlocking shared resources, message queues are a better model. A contrived example with tasks and threads - but actually quite a common pattern.
|
Message queue based communication between lots of very lightweight tasks - which are usually waiting for some event to happen - I/O, timer, etc is different from multi-threading for taking advantage of multiple cores. Tasks blocking on libuv events are OK. We can have thousands of them in the same thread. Multi-threading will be used for either spreading computation across cores or b) as in the database example above, since, AFAIK, the standard model in ODBC is thread blocking calls. |
Closed in favor of #12042 |
Revisiting the idea to have support for "channels" in Base.
Channel
is used for inter-task communication. Implementation is intask.jl
put!
without blocking. Defaults to 1.Any
.ClusterManager
interface change, where multiple tasks work in parallel totally asynchronously, unlike theproduce
/consume
pattern which works best where both the producer and consumer work in lock-step.put!
blocks when a channel is full.take!
andwait
block when it is empty.isready
denotes availability of data in the channelRemoteRef
has been renamed toRemoteChannel
. Like local channels, they too have a size and type associated with it. Defaults to 1 andAny
. Unlike aChannel
, aRemoteChannel
is accessible across processes.TODO:
RemoteChannel
to use aChannel