Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated automatic recovery after rabbitmq restart #376

Closed
bkanhoopla opened this issue Feb 3, 2016 · 2 comments
Closed

Repeated automatic recovery after rabbitmq restart #376

bkanhoopla opened this issue Feb 3, 2016 · 2 comments
Assignees

Comments

@bkanhoopla
Copy link
Contributor

With automatic recovery turned on and heartbeat set to 30 seconds, we noticed that after rabbitmq was killed and restarted, Bunny 2.2.2 would detect and recover from the network failure, and then repeatedly detect and recover from more "network failures" over and over again every ~60 seconds.

We can reproduce the issue from our ubuntu docker container, talking to rabbitmq 3.5.4 running in another docker container, by running Bunny::new(automatic_recovery: true, heartbeat: 30, log_level: :debug, host: 'mq').start, then kill and restart the rabbitmq container:

From Bunny:

rails@0321f4fb634f:/opt/rails$ irb
irb(main):001:0> require 'bunny'  
=> true
irb(main):002:0> Bunny::new(automatic_recovery: true, heartbeat: 30, log_level: :debug, host: 'mq').start   
D, [2016-02-03T21:43:40.915077 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:43:40.917025 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:43:40.917439 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:43:40.917554 #615]  INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:43:40.917830 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:43:40.917938 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:43:40.918338 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...
=> #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>

After rabbitmq restart, Bunny was able to recover:

irb(main):003:0> E, [2016-02-03T21:43:50.046663 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Got an exception when receiving data: Connection reset by peer (Errno::ECONNRESET)
W, [2016-02-03T21:43:50.046922 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:43:55.047828 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
D, [2016-02-03T21:44:00.049828 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
W, [2016-02-03T21:44:00.050039 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Retrying connection on next host in line: mq:5672
D, [2016-02-03T21:44:00.050976 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:44:00.272938 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:44:00.292084 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:44:00.292245 #615]  INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:44:00.292468 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:44:00.292563 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:44:00.319744 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...
E, [2016-02-03T21:44:00.320921 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Exception in the reader loop: AMQ::Protocol::EmptyResponseError: Empty response received from the server.
E, [2016-02-03T21:44:00.321199 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Backtrace: 
E, [2016-02-03T21:44:00.321367 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/amq-protocol-2.0.1/lib/amq/protocol/frame.rb:60:in `decode_header'
E, [2016-02-03T21:44:00.321437 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/transport.rb:245:in `read_next_frame'
E, [2016-02-03T21:44:00.321519 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:68:in `run_once'
E, [2016-02-03T21:44:00.321606 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:35:in `block in run_loop'
E, [2016-02-03T21:44:00.321665 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:32:in `loop'
E, [2016-02-03T21:44:00.321719 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:32:in `run_loop'
W, [2016-02-03T21:44:00.321780 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:44:05.322090 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
D, [2016-02-03T21:44:10.322680 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
W, [2016-02-03T21:44:10.323024 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Retrying connection on next host in line: mq:5672
D, [2016-02-03T21:44:10.324261 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:44:10.326424 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:44:10.326830 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:44:10.326966 #615]  INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:44:10.327297 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:44:10.327478 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:44:10.327990 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...

But after a few heartbeat, it "detects" another "network failure" again even though rabbit and the network were fine:

D, [2016-02-03T21:44:24.329173 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:44:10 +0000, interval (s): 14
D, [2016-02-03T21:44:38.330812 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:44:10 +0000, interval (s): 14
D, [2016-02-03T21:44:52.332800 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:44:10 +0000, interval (s): 14
D, [2016-02-03T21:45:06.334932 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:44:10 +0000, interval (s): 14
E, [2016-02-03T21:45:06.355689 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Got an exception when receiving data: IO timeout when reading 7 bytes (Timeout::Error)
W, [2016-02-03T21:45:06.355841 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:45:11.357462 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
D, [2016-02-03T21:45:16.358137 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
W, [2016-02-03T21:45:16.358328 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Retrying connection on next host in line: mq:5672
D, [2016-02-03T21:45:16.359669 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:45:16.361179 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:45:16.361555 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:45:16.361629 #615]  INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:45:16.362203 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:45:16.362349 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:45:16.362648 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...
E, [2016-02-03T21:45:16.363248 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Exception in the reader loop: AMQ::Protocol::EmptyResponseError: Empty response received from the server.
E, [2016-02-03T21:45:16.363575 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Backtrace: 
E, [2016-02-03T21:45:16.363732 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/amq-protocol-2.0.1/lib/amq/protocol/frame.rb:60:in `decode_header'
E, [2016-02-03T21:45:16.363868 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/transport.rb:245:in `read_next_frame'
E, [2016-02-03T21:45:16.363938 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:68:in `run_once'
E, [2016-02-03T21:45:16.364012 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:35:in `block in run_loop'
E, [2016-02-03T21:45:16.364254 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:32:in `loop'
E, [2016-02-03T21:45:16.364309 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>:     /home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:32:in `run_loop'
W, [2016-02-03T21:45:16.364353 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:45:21.365945 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
D, [2016-02-03T21:45:26.367758 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
W, [2016-02-03T21:45:26.368231 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Retrying connection on next host in line: mq:5672
D, [2016-02-03T21:45:26.369541 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent protocol preamble
D, [2016-02-03T21:45:26.371920 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.start-ok
D, [2016-02-03T21:45:26.372314 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval negotiation: client = 30, server = 580, result = 30
I, [2016-02-03T21:45:26.372474 #615]  INFO -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Heartbeat interval used (in seconds): 30
D, [2016-02-03T21:45:26.373352 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.tune-ok with heartbeat interval = 30, frame_max = 131072, channel_max = 65535
D, [2016-02-03T21:45:26.373697 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sent connection.open with vhost = /
D, [2016-02-03T21:45:26.374030 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Initializing heartbeat sender...

The pattern repeated itself, about once every minute or so

D, [2016-02-03T21:45:40.375500 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:45:26 +0000, interval (s): 14
D, [2016-02-03T21:45:54.376492 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:45:26 +0000, interval (s): 14
D, [2016-02-03T21:46:08.377315 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Sending a heartbeat, last activity time: 2016-02-03 21:45:26 +0000, interval (s): 14
E, [2016-02-03T21:46:16.356715 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Got an exception when receiving data: IO timeout when reading 7 bytes (Timeout::Error)
W, [2016-02-03T21:46:16.356872 #615]  WARN -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Recovering from a network failure...
D, [2016-02-03T21:46:21.358255 #615] DEBUG -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: About to start connection recovery...
E, [2016-02-03T21:46:22.364198 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Got an exception when receiving data: IO timeout when reading 7 bytes (Timeout::Error)
E, [2016-02-03T21:46:22.364394 #615] ERROR -- #<Bunny::Session:0x7f422466bad0 guest@mq:5672, vhost=/, addresses=[mq:5672]>: Exception in the reader loop: AMQ::Protocol::EmptyResponseError: Empty response received from the server

Based on the little we know about Bunny, it seems like during the first recovery attempt after the reader loop detects the failure, it tried to recover from the failure and created 2 new transports and 2 new reader loops.

The first reader loop was created thru here (line number may be off due to extra debug logging):

/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:853:in `start_reader_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:300:in `start'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:671:in `recover_from_network_failure'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:641:in `handle_network_failure'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/transport.rb:221:in `rescue in read_fully'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/transport.rb:213:in `read_fully'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/transport.rb:236:in `read_next_frame'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:70:in `run_once'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:37:in `block in run_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:34:in `loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:34:in `run_loop'

After the first error in read_fully was caught and handled, it seems like it continued on to decode_header which also failed and triggered a second call to handle_network_failure from here:

/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:853:in `start_reader_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:300:in `start'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:671:in `recover_from_network_failure'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/session.rb:641:in `handle_network_failure'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:44:in `rescue in block in run_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:35:in `block in run_loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:34:in `loop'
/home/rails/.rbenv/versions/2.1.6/lib/ruby/gems/2.1.0/gems/bunny-2.2.2/lib/bunny/reader_loop.rb:34:in `run_loop'

It seems like now there are 2 reader loop threads with different transports are running concurrently while there is only one heartbeat sender sending heartbeats over only one of the transport. After ~66 seconds (2.2 x heartbeat), the reader loop thread that didn't have a heartbeat timed out.

That kicked off another round of network failure recovery, creating a new transport, creating a new reader loop, and restarting the heartbeat sender on the new transport. Now once again we have 2 reader loops with 2 transports, and only one of them has a heartbeat. The same pattern repeated itself again 66 seconds later and the one thread that didn't have a heartbeat timed out.

We were able to fix the symptom by adding this line @reader_loop.stop if @reader_loop to handle_network_failure() to here, which will flag the reader_loop to stop before it tries to create a new one.

I suspect adding that line here may work too.

BTW, we were not able to reproduce the same symptom on OS X, not sure why.

@michaelklishin
Copy link
Member

Any exception on the reader loop will result in a recovery attempt because Bunny cannot make assumptions about whether its socket is still actually functional.

It sounds like a Docker-specific issue but please submit a PR that helps your case. I can test it some more.

@michaelklishin
Copy link
Member

OK, this introduces no regressions and makes sense to me. #377 is merged and will be in 2.2.3 and 2.3.0.

take-five added a commit to salemove/freddy that referenced this issue Sep 26, 2016
This resolves an issue with automatic recovery from network failures
in docker containers.

See ruby-amqp/bunny#376
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants