Exception during recovery causes recovery failure #658

mikenorgate · 2019-09-17T14:13:47Z

I've run into an issue where an exception happens after recovery which causes the connection to be closed and no attempt to recover.

As far as I can tell from logs this is what's happening

RecoverySucceeded event raised
ModelShutdown event raised with close reason 530 NOT_ALLOWED
Connection is closed but ConnectionShutdown event is not raised

From a quick look over the code it seems like the problem is here:

rabbitmq-dotnet-client/projects/client/RabbitMQ.Client/src/client/impl/Connection.cs

Lines 575 to 586 in 015c517

    
           public void HandleMainLoopException(ShutdownEventArgs reason) 
        
           { 
        
               if (!SetCloseReason(reason)) 
        
               { 
        
                   LogCloseError("Unexpected Main Loop Exception while closing: " 
        
                                 + reason, new Exception(reason.ToString())); 
        
                   return; 
        
               } 
        
               OnShutdown(); 
        
               LogCloseError("Unexpected connection closure: " + reason, new Exception(reason.ToString())); 
        
           }

There is already a close reason set but there is another exception happening

Here is the full exception taken from m_shutdownReport

Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
   at RabbitMQ.Client.Impl.InboundFrame.ReadFrom(NetworkBinaryReader reader)
   at RabbitMQ.Client.Framing.Impl.Connection.MainLoopIteration()
   at RabbitMQ.Client.Framing.Impl.Connection.ClosingLoop()

The text was updated successfully, but these errors were encountered:

michaelklishin · 2019-09-17T14:18:20Z

We cannot draw any conclusions with a single stack trace. Please collect and share server logs and a traffic capture. If you believe you have a decent understanding of the problem and can reproduce it, consider submitting a pull request.

Note that connection recovery has been simplified just a few days ago in #656, too.

Lashas83 · 2019-11-22T09:05:40Z

Seems that neither old, neither new version of connection recovery actually solves the problem if exceptions occurs while trying to recover topology. This only logs an exception, but does nothing to recover the consumer. But if client tries to reconnect once again - then recoverry will be triggered once again.

Probably the simplest way to reproduce is to have cluster with durable, but not HA 2 queues, where each of queue would live on different node. If we stop one of the node - client will reconnect to other node, but will not be able to start consuming messages from the queue, which has no master node right now, and will throw an exception. Expected behavior would be that consumer will try to recover it forever (like with connection recovery). In this case - until node, which owns that queue would come back online.

What this means - that we cannot trust topology recovery, and instead we need to implement our own wrapper around it to recover it.

You can actually check the code:
https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/v5.1.1/projects/client/RabbitMQ.Client/src/client/impl/AutorecoveringConnection.cs#L892 -
https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/master/projects/client/RabbitMQ.Client/src/client/impl/AutorecoveringConnection.cs#L800
RecordedConsumer.Recover throws an exception - that consumer is not tried to recover once again -as the exception will be only logged, but nothing else will be done with it. And actually there is no way to intercept that exception and act on it.

michaelklishin · 2019-11-26T08:09:12Z

This library cannot know how it should recover from topology recovery failures. It works very well for a pretty significant number of users. The docs do not promise that it will cover every case.

A contribution that makes it possible to react to topology exceptions would be considered.

acogoluegnes · 2019-12-02T09:28:57Z

Retry logic and filtering for recovery has been added in the Java client. Even though this is not a trivial task, a PR based on the Java implementation would be welcome.

lukebakken · 2023-03-17T17:13:24Z

This will be addressed by #1312

lukebakken · 2023-03-25T15:04:30Z

@rosca-sabina @mikenorgate

lukebakken added this to the 7.0.0 milestone Feb 8, 2020

lukebakken added enhancement help wanted next-gen-todo If a rewrite happens, address this issue. labels Feb 8, 2020

lukebakken modified the milestones: 8.0.0, 7.0.0 Mar 8, 2022

rosca-sabina mentioned this issue Mar 14, 2023

Add custom filtering and exception handling to topology recovery #1312

Merged

11 tasks

lukebakken modified the milestones: 7.0.0, 6.5.0 Mar 14, 2023

rosca-sabina mentioned this issue Mar 20, 2023

Add custom filtering and exception handling to topology recovery on 6.x #1316

Merged

11 tasks

lukebakken closed this as completed in #1312 Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception during recovery causes recovery failure #658

Exception during recovery causes recovery failure #658

mikenorgate commented Sep 17, 2019

michaelklishin commented Sep 17, 2019

Lashas83 commented Nov 22, 2019 •

edited

Loading

michaelklishin commented Nov 26, 2019

acogoluegnes commented Dec 2, 2019

lukebakken commented Mar 17, 2023

lukebakken commented Mar 25, 2023

Exception during recovery causes recovery failure #658

Exception during recovery causes recovery failure #658

Comments

mikenorgate commented Sep 17, 2019

michaelklishin commented Sep 17, 2019

Lashas83 commented Nov 22, 2019 • edited Loading

michaelklishin commented Nov 26, 2019

acogoluegnes commented Dec 2, 2019

lukebakken commented Mar 17, 2023

lukebakken commented Mar 25, 2023

Lashas83 commented Nov 22, 2019 •

edited

Loading