Repeated automatic recovery after rabbitmq restart #376

bkanhoopla · 2016-02-03T23:04:20Z

With automatic recovery turned on and heartbeat set to 30 seconds, we noticed that after rabbitmq was killed and restarted, Bunny 2.2.2 would detect and recover from the network failure, and then repeatedly detect and recover from more "network failures" over and over again every ~60 seconds.

We can reproduce the issue from our ubuntu docker container, talking to rabbitmq 3.5.4 running in another docker container, by running Bunny::new(automatic_recovery: true, heartbeat: 30, log_level: :debug, host: 'mq').start, then kill and restart the rabbitmq container:

From Bunny:

rails@0321f4fb634f:/opt/rails$ irb
irb(main):001:0> require 'bunny'  
=> true
irb(main):002:0> Bunny::new(automatic_recovery: true, heartbeat: 30, log_level: :debug, host: 'mq').start   
D, [2016-02-03T21:43:40.915077 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:43:40.917025 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:43:40.917439 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:43:40.917554 #615]  INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:43:40.917830 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:43:40.917938 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:43:40.918338 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...
=> #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>

After rabbitmq restart, Bunny was able to recover:

irb(main):003:0> E, [2016-02-03T21:43:50.046663 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Got an exception when receiving data: Connection reset by peer (Errno::ECONNRESET)
W, [2016-02-03T21:43:50.046922 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:43:55.047828 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
D, [2016-02-03T21:44:00.049828 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
W, [2016-02-03T21:44:00.050039 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Retrying connection on next host in line: mq:5672
D, [2016-02-03T21:44:00.050976 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:44:00.272938 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:44:00.292084 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:44:00.292245 #615]  INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:44:00.292468 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:44:00.292563 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:44:00.319744 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...
E, [2016-02-03T21:44:00.320921 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Exception in the reader loop: AMQ::Protocol::EmptyResponseError: Empty response received from the server.
E, [2016-02-03T21:44:00.321199 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Backtrace: 
E, [2016-02-03T21:44:00.321367 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/amq-protocol-2.0.1/lib/amq/protocol/frame.rb:60:in `decode_header'
E, [2016-02-03T21:44:00.321437 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/transport.rb:245:in `read_next_frame'
E, [2016-02-03T21:44:00.321519 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:68:in `run_once'
E, [2016-02-03T21:44:00.321606 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:35:in `block in run_loop'
E, [2016-02-03T21:44:00.321665 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:32:in `loop'
E, [2016-02-03T21:44:00.321719 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:32:in `run_loop'
W, [2016-02-03T21:44:00.321780 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:44:05.322090 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
D, [2016-02-03T21:44:10.322680 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
W, [2016-02-03T21:44:10.323024 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Retrying connection on next host in line: mq:5672
D, [2016-02-03T21:44:10.324261 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:44:10.326424 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:44:10.326830 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:44:10.326966 #615]  INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:44:10.327297 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:44:10.327478 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:44:10.327990 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...

But after a few heartbeat, it "detects" another "network failure" again even though rabbit and the network were fine:

D, [2016-02-03T21:44:24.329173 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:44:10 +0000, interval (s): 14
D, [2016-02-03T21:44:38.330812 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:44:10 +0000, interval (s): 14
D, [2016-02-03T21:44:52.332800 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:44:10 +0000, interval (s): 14
D, [2016-02-03T21:45:06.334932 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:44:10 +0000, interval (s): 14
E, [2016-02-03T21:45:06.355689 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Got an exception when receiving data: IO timeout when reading 7 bytes (Timeout::Error)
W, [2016-02-03T21:45:06.355841 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:45:11.357462 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
D, [2016-02-03T21:45:16.358137 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
W, [2016-02-03T21:45:16.358328 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Retrying connection on next host in line: mq:5672
D, [2016-02-03T21:45:16.359669 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:45:16.361179 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:45:16.361555 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:45:16.361629 #615]  INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:45:16.362203 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:45:16.362349 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:45:16.362648 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...
E, [2016-02-03T21:45:16.363248 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Exception in the reader loop: AMQ::Protocol::EmptyResponseError: Empty response received from the server.
E, [2016-02-03T21:45:16.363575 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Backtrace: 
E, [2016-02-03T21:45:16.363732 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/amq-protocol-2.0.1/lib/amq/protocol/frame.rb:60:in `decode_header'
E, [2016-02-03T21:45:16.363868 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/transport.rb:245:in `read_next_frame'
E, [2016-02-03T21:45:16.363938 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:68:in `run_once'
E, [2016-02-03T21:45:16.364012 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:35:in `block in run_loop'
E, [2016-02-03T21:45:16.364254 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:32:in `loop'
E, [2016-02-03T21:45:16.364309 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:32:in `run_loop'
W, [2016-02-03T21:45:16.364353 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:45:21.365945 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
D, [2016-02-03T21:45:26.367758 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
W, [2016-02-03T21:45:26.368231 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Retrying connection on next host in line: mq:5672
D, [2016-02-03T21:45:26.369541 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:45:26.371920 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:45:26.372314 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:45:26.372474 #615]  INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:45:26.373352 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:45:26.373697 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:45:26.374030 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...

The pattern repeated itself, about once every minute or so

D, [2016-02-03T21:45:40.375500 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:45:26 +0000, interval (s): 14
D, [2016-02-03T21:45:54.376492 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:45:26 +0000, interval (s): 14
D, [2016-02-03T21:46:08.377315 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:45:26 +0000, interval (s): 14
E, [2016-02-03T21:46:16.356715 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Got an exception when receiving data: IO timeout when reading 7 bytes (Timeout::Error)
W, [2016-02-03T21:46:16.356872 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:46:21.358255 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
E, [2016-02-03T21:46:22.364198 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Got an exception when receiving data: IO timeout when reading 7 bytes (Timeout::Error)
E, [2016-02-03T21:46:22.364394 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Exception in the reader loop: AMQ::Protocol::EmptyResponseError: Empty response received from the server

Based on the little we know about Bunny, it seems like during the first recovery attempt after the reader loop detects the failure, it tried to recover from the failure and created 2 new transports and 2 new reader loops.

The first reader loop was created thru here (line number may be off due to extra debug logging):

/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:853:in `start_reader_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:300:in `start'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:671:in `recover_from_network_failure'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:641:in `handle_network_failure'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/transport.rb:221:in `rescue in read_fully'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/transport.rb:213:in `read_fully'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/transport.rb:236:in `read_next_frame'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:70:in `run_once'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:37:in `block in run_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:34:in `loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:34:in `run_loop'

After the first error in read_fully was caught and handled, it seems like it continued on to decode_header which also failed and triggered a second call to handle_network_failure from here:

/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:853:in `start_reader_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:300:in `start'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:671:in `recover_from_network_failure'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:641:in `handle_network_failure'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:44:in `rescue in block in run_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:35:in `block in run_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:34:in `loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:34:in `run_loop'

It seems like now there are 2 reader loop threads with different transports are running concurrently while there is only one heartbeat sender sending heartbeats over only one of the transport. After ~66 seconds (2.2 x heartbeat), the reader loop thread that didn't have a heartbeat timed out.

That kicked off another round of network failure recovery, creating a new transport, creating a new reader loop, and restarting the heartbeat sender on the new transport. Now once again we have 2 reader loops with 2 transports, and only one of them has a heartbeat. The same pattern repeated itself again 66 seconds later and the one thread that didn't have a heartbeat timed out.

We were able to fix the symptom by adding this line @reader_loop.stop if @reader_loop to handle_network_failure() to here, which will flag the reader_loop to stop before it tries to create a new one.

I suspect adding that line here may work too.

BTW, we were not able to reproduce the same symptom on OS X, not sure why.

The text was updated successfully, but these errors were encountered:

michaelklishin · 2016-02-03T23:25:47Z

Any exception on the reader loop will result in a recovery attempt because Bunny cannot make assumptions about whether its socket is still actually functional.

It sounds like a Docker-specific issue but please submit a PR that helps your case. I can test it some more.

michaelklishin · 2016-02-04T08:20:52Z

OK, this introduces no regressions and makes sense to me. #377 is merged and will be in 2.2.3 and 2.3.0.

This resolves an issue with automatic recovery from network failures in docker containers. See ruby-amqp/bunny#376

bkanhoopla mentioned this issue Feb 4, 2016

Fix repeated automatic recovery from network failures after restarting rabbitmq once #377

Merged

michaelklishin self-assigned this Feb 4, 2016

michaelklishin closed this as completed Feb 4, 2016

take-five added a commit to salemove/freddy that referenced this issue Sep 26, 2016

Update bunny dependency to 2.5.x

9895b2d

This resolves an issue with automatic recovery from network failures in docker containers. See ruby-amqp/bunny#376

take-five mentioned this issue Sep 26, 2016

Update bunny version salemove/freddy#44

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated automatic recovery after rabbitmq restart #376

Repeated automatic recovery after rabbitmq restart #376

bkanhoopla commented Feb 3, 2016

michaelklishin commented Feb 3, 2016

michaelklishin commented Feb 4, 2016

Repeated automatic recovery after rabbitmq restart #376

Repeated automatic recovery after rabbitmq restart #376

Comments

bkanhoopla commented Feb 3, 2016

michaelklishin commented Feb 3, 2016

michaelklishin commented Feb 4, 2016