-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Heartbeat Write Timer Handling #636
Improved Heartbeat Write Timer Handling #636
Conversation
- Resolves an issue with the Heartbeat Write Timer repeatedly firing and blocking on the sychronization lock. The timer will now fire once and be restarted (if not disposed) at the end of method processing.
It appears builds are failing due to configuration issues? |
@ash-ricado you are correct, CI needs propping every so often. We will QA this, don't worry. |
@ash-ricado any way we can reasonably reliably reproduce this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks fine to me apart from the interval change which I don't know the reason for.
{ | ||
_heartbeatReadTimer = new Timer(HeartbeatReadTimerCallback); | ||
|
||
_heartbeatReadTimer.Change(300, Timeout.Infinite); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change from 200 to 300 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply changed this to prevent both the Read and Write Timers firing at the same time. I was considering using Random.Next(100,300)
to ensure that with multiple connections, the timers will fire at different times.
Not required to fix the issue this PR deals with. More than happy to remove if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We strongly discourage heartbeat timeouts < 5 seconds so any timer interval < 1s should be reasonable, and anything < 500 ms is optimal IMO.
This looks reasonable. @ash-ricado please undo the interval change (or justify it) and provide some detail on the problem this PR addresses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wait for some extra details first.
@michaelklishin Apologies for the delay in my response. Problem solved by this PR Identified by our use of remote nodes that talk to a RabbitMQ Cluster over connections that can sometimes become unstable. Monitoring allowed us to see the ThreadPool increase over a period of an hour until the maximum number of threads was consumed. Should be able to reliably reproduce by publishing data (12K or larger) to a channel every 5 seconds, then drop the network interface that provides access to the RabbitMQ Server. The Heartbeat Write Timer should continue to fire and wait on the lock. NOTE: This does rely on the socket hanging after an unclean network disconnection. If the socket is able to be cleanly shutdown, the attempt to write frames will return immediately. |
OK, so injecting a latency spike with Toxiproxy or similar would do. Thank you. |
Proposed Changes
Modified synchronization and parameters of the Write Heartbeat Timer to ensure it can only fire once. It will then be started again at the end of the callback.
This resolves an issue that can occur when writing a heartbeat frame to the socket fails (e.g. broken connection) and the Write Heartbeat Timer continued to fire and block on the lockable
_heartbeatWriteLock
object. Over time this would eventually exhaust the ThreadPool if the socket write didn't return.Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply. You can also fill these out after creatingthe PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.
CONTRIBUTING.md
documentFurther Comments