Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool workers to handle incoming deliveries, ACKs etc. #906

Closed
stebet opened this issue Jul 8, 2020 · 7 comments
Closed

Pool workers to handle incoming deliveries, ACKs etc. #906

stebet opened this issue Jul 8, 2020 · 7 comments

Comments

@stebet
Copy link
Contributor

stebet commented Jul 8, 2020

After reading through a few issues (#876, #874) and doing some profiling I think I've found what the underlying issue is. Would love to get input from @danielmarbach, @bollhals and @bording on this.

Creating a new IModel, currently requires creating a new ConsumerDispatcher, which creates a WorkPool to handle incoming delivers, ACKs etc. This is all backed by a Channel and a Task that reads from that channel and executes it. The problem is, every time a new IModel is created, that means a new Task is created, which requires scheduling, which doesn't happen immediately, especielly under high churn. This effectively also makes it pretty expensive to both create and dispose of an IModel instance. You can easily see this by profiling a program that does nothing more than create and dispose a lot of IModel instances in the latest versions.

I think it would be very beneficial to keep a pool of WorkPool-like worker tasks on the Connection itself, which could be grabbed in the IConsumerDispatchers and those could then be handed the Work items to execute when these take place instead of scheduling a new Task for every model. The new concurrent processing configuration by @danielmarbach might need a little extra thought to handle parallel processing of deliveries per IModel using a shared pool, but if the Semaphores are still kept on the dispatchers which are per model, and those in turn decide how many tasks in parallel to hand over to the shared worker pool, I think that should still work fine.

So basically I'm proposing something like this:

  • Connection will have a shared pool of workers (inital size = Environment.ProcessorCount?, max size = something sane) to handle incoming actions/promises.
  • Model has a dispatcher which knows how many actions/promises it can process concurrently
  • When a model has work to do (BasicDeliver, BasicAck etc.) it increments it's Semaphore, grabs a worker from the connection worker pool (or creates a new one if none is available perhaps?), and schedules the work handing over it's semaphore as well, and when the worker is done executing, the Semaphore is released, and the worker returned to the pool. That means that if a new worker was created, the pool effectively grows larger as needed as well, and unless a lot of deliveries are taking place we should seldomly need to spin up new worker task, or in the worst case, if we have emptied the pool, the task is scheduled to the ThreadPool just like it was before.

This makes sense in my head at least, and could also increase throughput for high-churn scenarios.

@michaelklishin
Copy link
Member

Note that at some point in the past we have had a dispatch pool per connection. I don't have an issue where this has been changed right now but the reasoning was to improve concurrency for some workloads. We expected that it would be problematic for high channel churn workloads but those workloads are problematic in many other ways. Channels are not meant to be very short lived.

@stebet
Copy link
Contributor Author

stebet commented Jul 8, 2020

Completely understand. In that case, this might be totally unnecessary, but at least it's food for thought :)

@bollhals
Copy link
Contributor

bollhals commented Jul 8, 2020

Channels are not meant to be very short lived.

Also remember that since PR #878 you can share the channel from multiple threads, so opening / closing channels should also be needed less than before.

@stebet Do you have any data that shows where we spend how much time? (Just curious)

@bojanz27
Copy link

Also remember that since PR #878 you can share the channel from multiple threads, so opening / closing channels should also be needed less than before.

If that is so, why not state it explicitly in the changelog? The phrasing that its now "safer but should be avoided" only adds to the confusion.

@bollhals
Copy link
Contributor

Also remember that since PR #878 you can share the channel from multiple threads, so opening / closing channels should also be needed less than before.

If that is so, why not state it explicitly in the changelog? The phrasing that its now "safer but should be avoided" only adds to the confusion.

There is still the part about publish acknowledgments that are not thread safe yet. But I agree the documentation could be updated to reflect it more correctly.

@stebet
Copy link
Contributor Author

stebet commented Oct 18, 2021

Channels are not meant to be very short lived.

Also remember that since PR #878 you can share the channel from multiple threads, so opening / closing channels should also be needed less than before.

@stebet Do you have any data that shows where we spend how much time? (Just curious)

Not recent data no. I'll take a look at getting up to speed soon. Have been busy with a bunch of other projects recently, but I haven't forgotten you guys :)

@lukebakken
Copy link
Contributor

Closing due to lack of activity and my wish to get as many issues closed prior to the v7 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants