Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Heartbeat Write Timer Handling #636

Merged
merged 1 commit into from
Apr 8, 2019
Merged

Improved Heartbeat Write Timer Handling #636

merged 1 commit into from
Apr 8, 2019

Conversation

ashneilson
Copy link
Contributor

@ashneilson ashneilson commented Apr 3, 2019

  • Resolves an issue with the Heartbeat Write Timer repeatedly firing and blocking on the sychronization lock. The timer will now fire once and be restarted (if not disposed) at the end of method processing.

Proposed Changes

Modified synchronization and parameters of the Write Heartbeat Timer to ensure it can only fire once. It will then be started again at the end of the callback.

This resolves an issue that can occur when writing a heartbeat frame to the socket fails (e.g. broken connection) and the Write Heartbeat Timer continued to fire and block on the lockable _heartbeatWriteLock object. Over time this would eventually exhaust the ThreadPool if the socket write didn't return.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation (correction or otherwise)
  • Cosmetics (whitespace, appearance)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

Further Comments

- Resolves an issue with the Heartbeat Write Timer repeatedly firing and blocking on the sychronization lock. The timer will now fire once and be restarted (if not disposed) at the end of method processing.
@ashneilson ashneilson changed the title Enhancement Improved Heartbeat Write Timer Handling Apr 3, 2019
@ashneilson
Copy link
Contributor Author

It appears builds are failing due to configuration issues?

@michaelklishin
Copy link
Member

@ash-ricado you are correct, CI needs propping every so often. We will QA this, don't worry.

@michaelklishin
Copy link
Member

@ash-ricado any way we can reasonably reliably reproduce this?

Copy link
Contributor

@kjnilsson kjnilsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks fine to me apart from the interval change which I don't know the reason for.

{
_heartbeatReadTimer = new Timer(HeartbeatReadTimerCallback);

_heartbeatReadTimer.Change(300, Timeout.Infinite);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change from 200 to 300 here?

Copy link
Contributor Author

@ashneilson ashneilson Apr 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply changed this to prevent both the Read and Write Timers firing at the same time. I was considering using Random.Next(100,300) to ensure that with multiple connections, the timers will fire at different times.

Not required to fix the issue this PR deals with. More than happy to remove if desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We strongly discourage heartbeat timeouts < 5 seconds so any timer interval < 1s should be reasonable, and anything < 500 ms is optimal IMO.

@michaelklishin
Copy link
Member

This looks reasonable. @ash-ricado please undo the interval change (or justify it) and provide some detail on the problem this PR addresses.

Copy link
Member

@michaelklishin michaelklishin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait for some extra details first.

@ashneilson
Copy link
Contributor Author

@michaelklishin Apologies for the delay in my response.

Problem solved by this PR
Resolves an issue that can occur when writing a heartbeat frame to the socket fails (e.g. broken connection) and the Write Heartbeat Timer continues to fire and block on the lockable _heartbeatWriteLock object. Over time this would eventually exhaust the ThreadPool if the socket write didn't return.

Identified by our use of remote nodes that talk to a RabbitMQ Cluster over connections that can sometimes become unstable. Monitoring allowed us to see the ThreadPool increase over a period of an hour until the maximum number of threads was consumed.

Should be able to reliably reproduce by publishing data (12K or larger) to a channel every 5 seconds, then drop the network interface that provides access to the RabbitMQ Server. The Heartbeat Write Timer should continue to fire and wait on the lock.

NOTE: This does rely on the socket hanging after an unclean network disconnection. If the socket is able to be cleanly shutdown, the attempt to write frames will return immediately.

@michaelklishin
Copy link
Member

OK, so injecting a latency spike with Toxiproxy or similar would do. Thank you.

@michaelklishin michaelklishin merged commit 5f7ccc4 into rabbitmq:master Apr 8, 2019
@ashneilson ashneilson deleted the heartbeat-write-deadlock branch January 26, 2021 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants