You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With automatic recovery turned on and heartbeat set to 30 seconds, we noticed that after rabbitmq was killed and restarted, Bunny 2.2.2 would detect and recover from the network failure, and then repeatedly detect and recover from more "network failures" over and over again every ~60 seconds.
We can reproduce the issue from our ubuntu docker container, talking to rabbitmq 3.5.4 running in another docker container, by running Bunny::new(automatic_recovery: true, heartbeat: 30, log_level: :debug, host: 'mq').start, then kill and restart the rabbitmq container:
After rabbitmq restart, Bunny was able to recover:
irb(main):003:0> E, [2016-02-03T21:43:50.046663 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Got an exception when receiving data: Connection reset by peer (Errno::ECONNRESET)
W, [2016-02-03T21:43:50.046922 #615] WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:43:55.047828 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
D, [2016-02-03T21:44:00.049828 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
W, [2016-02-03T21:44:00.050039 #615] WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Retrying connection on next host in line: mq:5672
D, [2016-02-03T21:44:00.050976 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:44:00.272938 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:44:00.292084 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:44:00.292245 #615] INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:44:00.292468 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:44:00.292563 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:44:00.319744 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...
E, [2016-02-03T21:44:00.320921 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Exception in the reader loop: AMQ::Protocol::EmptyResponseError: Empty response received from the server.
E, [2016-02-03T21:44:00.321199 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Backtrace:
E, [2016-02-03T21:44:00.321367 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/amq-protocol-2.0.1/lib/amq/protocol/frame.rb:60:in `decode_header'
E, [2016-02-03T21:44:00.321437 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/transport.rb:245:in `read_next_frame'
E, [2016-02-03T21:44:00.321519 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:68:in `run_once'
E, [2016-02-03T21:44:00.321606 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:35:in `block in run_loop'
E, [2016-02-03T21:44:00.321665 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:32:in `loop'
E, [2016-02-03T21:44:00.321719 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:32:in `run_loop'
W, [2016-02-03T21:44:00.321780 #615] WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:44:05.322090 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
D, [2016-02-03T21:44:10.322680 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
W, [2016-02-03T21:44:10.323024 #615] WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Retrying connection on next host in line: mq:5672
D, [2016-02-03T21:44:10.324261 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:44:10.326424 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:44:10.326830 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:44:10.326966 #615] INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:44:10.327297 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:44:10.327478 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:44:10.327990 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...
But after a few heartbeat, it "detects" another "network failure" again even though rabbit and the network were fine:
D, [2016-02-03T21:44:24.329173 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:44:10 +0000, interval (s): 14
D, [2016-02-03T21:44:38.330812 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:44:10 +0000, interval (s): 14
D, [2016-02-03T21:44:52.332800 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:44:10 +0000, interval (s): 14
D, [2016-02-03T21:45:06.334932 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:44:10 +0000, interval (s): 14
E, [2016-02-03T21:45:06.355689 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Got an exception when receiving data: IO timeout when reading 7 bytes (Timeout::Error)
W, [2016-02-03T21:45:06.355841 #615] WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:45:11.357462 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
D, [2016-02-03T21:45:16.358137 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
W, [2016-02-03T21:45:16.358328 #615] WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Retrying connection on next host in line: mq:5672
D, [2016-02-03T21:45:16.359669 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:45:16.361179 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:45:16.361555 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:45:16.361629 #615] INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:45:16.362203 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:45:16.362349 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:45:16.362648 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...
E, [2016-02-03T21:45:16.363248 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Exception in the reader loop: AMQ::Protocol::EmptyResponseError: Empty response received from the server.
E, [2016-02-03T21:45:16.363575 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Backtrace:
E, [2016-02-03T21:45:16.363732 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/amq-protocol-2.0.1/lib/amq/protocol/frame.rb:60:in `decode_header'
E, [2016-02-03T21:45:16.363868 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/transport.rb:245:in `read_next_frame'
E, [2016-02-03T21:45:16.363938 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:68:in `run_once'
E, [2016-02-03T21:45:16.364012 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:35:in `block in run_loop'
E, [2016-02-03T21:45:16.364254 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:32:in `loop'
E, [2016-02-03T21:45:16.364309 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:32:in `run_loop'
W, [2016-02-03T21:45:16.364353 #615] WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:45:21.365945 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
D, [2016-02-03T21:45:26.367758 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
W, [2016-02-03T21:45:26.368231 #615] WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Retrying connection on next host in line: mq:5672
D, [2016-02-03T21:45:26.369541 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:45:26.371920 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:45:26.372314 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:45:26.372474 #615] INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:45:26.373352 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:45:26.373697 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:45:26.374030 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...
The pattern repeated itself, about once every minute or so
D, [2016-02-03T21:45:40.375500 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:45:26 +0000, interval (s): 14
D, [2016-02-03T21:45:54.376492 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:45:26 +0000, interval (s): 14
D, [2016-02-03T21:46:08.377315 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:45:26 +0000, interval (s): 14
E, [2016-02-03T21:46:16.356715 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Got an exception when receiving data: IO timeout when reading 7 bytes (Timeout::Error)
W, [2016-02-03T21:46:16.356872 #615] WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:46:21.358255 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
E, [2016-02-03T21:46:22.364198 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Got an exception when receiving data: IO timeout when reading 7 bytes (Timeout::Error)
E, [2016-02-03T21:46:22.364394 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Exception in the reader loop: AMQ::Protocol::EmptyResponseError: Empty response received from the server
Based on the little we know about Bunny, it seems like during the first recovery attempt after the reader loop detects the failure, it tried to recover from the failure and created 2 new transports and 2 new reader loops.
The first reader loop was created thru here (line number may be off due to extra debug logging):
After the first error in read_fully was caught and handled, it seems like it continued on to decode_header which also failed and triggered a second call to handle_network_failure from here:
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:853:in `start_reader_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:300:in `start'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:671:in `recover_from_network_failure'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:641:in `handle_network_failure'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:44:in `rescue in block in run_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:35:in `block in run_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:34:in `loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:34:in `run_loop'
It seems like now there are 2 reader loop threads with different transports are running concurrently while there is only one heartbeat sender sending heartbeats over only one of the transport. After ~66 seconds (2.2 x heartbeat), the reader loop thread that didn't have a heartbeat timed out.
That kicked off another round of network failure recovery, creating a new transport, creating a new reader loop, and restarting the heartbeat sender on the new transport. Now once again we have 2 reader loops with 2 transports, and only one of them has a heartbeat. The same pattern repeated itself again 66 seconds later and the one thread that didn't have a heartbeat timed out.
We were able to fix the symptom by adding this line @reader_loop.stop if @reader_loop to handle_network_failure() to here, which will flag the reader_loop to stop before it tries to create a new one.
Any exception on the reader loop will result in a recovery attempt because Bunny cannot make assumptions about whether its socket is still actually functional.
It sounds like a Docker-specific issue but please submit a PR that helps your case. I can test it some more.
With automatic recovery turned on and heartbeat set to 30 seconds, we noticed that after rabbitmq was killed and restarted, Bunny 2.2.2 would detect and recover from the network failure, and then repeatedly detect and recover from more "network failures" over and over again every ~60 seconds.
We can reproduce the issue from our ubuntu docker container, talking to rabbitmq 3.5.4 running in another docker container, by running
Bunny::new(automatic_recovery: true, heartbeat: 30, log_level: :debug, host: 'mq').start
, then kill and restart the rabbitmq container:From Bunny:
After rabbitmq restart, Bunny was able to recover:
But after a few heartbeat, it "detects" another "network failure" again even though rabbit and the network were fine:
The pattern repeated itself, about once every minute or so
Based on the little we know about Bunny, it seems like during the first recovery attempt after the reader loop detects the failure, it tried to recover from the failure and created 2 new transports and 2 new reader loops.
The first reader loop was created thru here (line number may be off due to extra debug logging):
After the first error in read_fully was caught and handled, it seems like it continued on to
decode_header
which also failed and triggered a second call tohandle_network_failure
from here:It seems like now there are 2 reader loop threads with different transports are running concurrently while there is only one heartbeat sender sending heartbeats over only one of the transport. After ~66 seconds (2.2 x heartbeat), the reader loop thread that didn't have a heartbeat timed out.
That kicked off another round of network failure recovery, creating a new transport, creating a new reader loop, and restarting the heartbeat sender on the new transport. Now once again we have 2 reader loops with 2 transports, and only one of them has a heartbeat. The same pattern repeated itself again 66 seconds later and the one thread that didn't have a heartbeat timed out.
We were able to fix the symptom by adding this line
@reader_loop.stop if @reader_loop
tohandle_network_failure()
to here, which will flag the reader_loop to stop before it tries to create a new one.I suspect adding that line here may work too.
BTW, we were not able to reproduce the same symptom on OS X, not sure why.
The text was updated successfully, but these errors were encountered: