-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HeartbeatSender raises Errno::EPIPE on main thread when pipe is closed #559
Comments
If you run with |
I thought I misinterpreted the issue and that an I/O exception is thrown when a connection is closed cleanly. That doesn't seem to be the case:
Because of that and the fact that the example uses Toxiproxy I suspect that the exception only occurs when RabbitMQ's heartbeat mechanism detects the timeout and closes the TCP connection as it should. In that case it is expected that Bunny might get an I/O exception on a heartbeat (or any other) frame write and it is expected that with the settings used by the OP there will be an exception on the main thread (or rather, the thread that instantiates a |
The “threaded” option doesn’t impact which thread the heartbeat thread raises its exception on. If this example was run with “threaded: true” option, the reader loop will error first and also raise an exception on the main session thread. It seems that when “automatic_recovery” is off, the reader loop or heartbeat will raise exceptions on the session thread when IO errors occur. It looks like HeartbeatSender#run will never be able to rescue IOError/Exception because transport#write_without_timeout rescues and handles them with automatic_recover or raises them to the session thread. |
If automatic recovery is disabled, what else can Bunny do but to throw an exception? It was explicitly told not to recover. |
At most HeartbeatSener should mark the session as closed when there is an issue. Then the next attempt to use the Bunny will fail with a ConnectionClosedError. How can it be by design to have unreachable code in HeartbeatSender? It seems the Java client HeartbeatSender doesn't even close the connection. It just swallows all IOExceptions in HeartBeatSender.java. I feel like that's what it should do in Ruby. HeartbeatSender should make a best effort to keep the connection active at a regular interval and swallow any exceptions that occur along the way. It looks like that's exactly what HeartbeatSender initially did 55d57e3 |
The purpose of heartbeats is not to keep the connection alive, it is to detect unavailable peers quicker. |
If automatic recovery is disabled, throwing exceptions is the best option I could think of and I haven't changed my mind in quite a few years. If recovery is enabled, attempts to enable is the best course of action. Heartbeat write failures could first close the connection. In case of automatic recovery it would still kick off a recovery attempt. When it is disabled, I don't know if it would be meaningfully different if it was an I/O exception or a @acogoluegnes @lukebakken WDYT? |
I was referring to this section http://www.rabbitmq.com/heartbeats.html#tcp-proxies where it states that heartbeats can be used to generate network traffic to help with proxy's like HAProxy, but you're right, there are several uses. I'm unsure how the user "opted in" to handling exceptions here? Because automatically_recover is false? As it is implemented right now, if automatically_recover: false is used then Bunny will interrupt your application with an exceptions any time after the connection is started. AFAIK there is no way to handle the exceptions on the main thread, so the only safe way to do this would be to put Bunny in its own thread. conn = Bunny.new("amqp://guest:guest@localhost:56722",
heartbeat_timeout: 10,
automatically_recover: false,
threaded: false)
conn.start
# Now any heartbeat IO issues can cause exceptions and interrupt your application
MyApp.do_non_rabbit_important_tasks() automatically_recover: true isn't always desirable. Lets say I have an application that publishes messages to rabbitmq and there is a backup rabbitmq server it can publish to in case of failure. In this instance it would be ideal to have the publish to the downed server fail as quickly as possible so it can publish to the backup server. If automatically_recover is in enabled, the first publish will "sleep @network_recovery_interval" and then attempt to reopen the connection. It seems reasonable that we should be able to accomplish this with automatically_recover:false without "opting in" to get exceptions in my main application from heartbeat or reader_loop threads. |
If this issue is accepted, this would be a simple approach to fixing it IMO master...sbonebrake:safe_heartbeats |
What I don't understand is how it would be different from any other exception thrown in the "session thread" (the thread the session was instantiated in?). If you don't want recovery, exceptions will be thrown on the caller thread (IIRC that's the term we used in the docs). Is an I/O exception not indicative enough? Java client carries I/O exceptions as shutdown reason, so in that sense they are not hidden from the user. |
If I'm running Lets say this a rails app that publishes messages when it gets requests. class MessageController < ActionController::Base
def bunny
return @bunny if @bunny
@bunny = Bunny.new(host: "something",
automatically_recover: false,
threaded: false,
heartbeat_interval: 30)
@bunny.start
end
def create
begin
bunny.queue("messages").publish(params[:message_content])
rescue Bunny::NetworkFailure => e
logger.log(e)
render :nothing => true, :status => :service_unavailable
@bunny = nil
end
end
def index
# Nothing Bunny related
Messages.last(10)
end
end If I get a request to
Even if I wrapped the whole rails application in a |
Have you had any time to think about this further? Maybe @acogoluegnes @lukebakken have thoughts on this? |
I have been working on higher-priority items. When I get a couple hours free to really think about this issue, I'll check it out. |
@sbonebrake OK, I see what you are unhappy about: potentially unexpected exceptions. I think we can make Bunny heartbeat sender do what its Java client counterpart does. I cannot guess how many people would prefer the current behavior but we already double checked that the connection is marked as closed, so there's a reasonable degree of observability of failure in place. |
Thanks @michaelklishin . The suggested diff above is a "least effort" fix and isn't the cleanest, but if you'd like me to put it into a PR with tests I can. Unsure if you had other ideas about how to address it. |
I confirm that Java client hearbeat implementation on the sending part is quite passive, as it sends heartbeat frames in the background and indeed ignores write errors. Missed heartbeats from the server are incremented and checked only when a reading operation times out. I think this implementation is consistent with the heartbeat spec. It could be more agressive in the way it detects failures, but failure detection is tricky in practice. Not sure that heartbeat sending errors should bubble up to the application code though. |
@sbonebrake please submit a PR and we'll discuss a specific solution there. I don't know if there can be a neat way to silence some exceptions ;) |
This can be closed now. You can also get a similar problem if you run the spec "when the the connection detects missed heartbeats without automatic recovery" with |
|
Reproduced with rabbit 3.7.6 and bunny 2.11.0.
This happens when bunny is created with { automatically_recover: false, threaded: false }. After the connection is open, if at any point it disconnects during a heartbeat attempt it will raise the exception on the thread that initialized bunny.
Code for repro here: https://github.com/sbonebrake/bunny_repro/blob/master/heartbeat_failure_app.rb
I'm not 100% certain what is intended to happen when a heartbeat fails. It seems the heartbeat thread should probably not use transport#write_without_timeout method that raises exceptions to the session thread.
The text was updated successfully, but these errors were encountered: