Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception during recovery causes recovery failure #658

Closed
mikenorgate opened this issue Sep 17, 2019 · 6 comments · Fixed by #1312
Closed

Exception during recovery causes recovery failure #658

mikenorgate opened this issue Sep 17, 2019 · 6 comments · Fixed by #1312
Labels
enhancement help wanted next-gen-todo If a rewrite happens, address this issue.
Milestone

Comments

@mikenorgate
Copy link
Contributor

I've run into an issue where an exception happens after recovery which causes the connection to be closed and no attempt to recover.

As far as I can tell from logs this is what's happening

  • RecoverySucceeded event raised
  • ModelShutdown event raised with close reason 530 NOT_ALLOWED
  • Connection is closed but ConnectionShutdown event is not raised

From a quick look over the code it seems like the problem is here:

public void HandleMainLoopException(ShutdownEventArgs reason)
{
if (!SetCloseReason(reason))
{
LogCloseError("Unexpected Main Loop Exception while closing: "
+ reason, new Exception(reason.ToString()));
return;
}
OnShutdown();
LogCloseError("Unexpected connection closure: " + reason, new Exception(reason.ToString()));
}

There is already a close reason set but there is another exception happening

Here is the full exception taken from m_shutdownReport

Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
   at RabbitMQ.Client.Impl.InboundFrame.ReadFrom(NetworkBinaryReader reader)
   at RabbitMQ.Client.Framing.Impl.Connection.MainLoopIteration()
   at RabbitMQ.Client.Framing.Impl.Connection.ClosingLoop()
@michaelklishin
Copy link
Member

We cannot draw any conclusions with a single stack trace. Please collect and share server logs and a traffic capture. If you believe you have a decent understanding of the problem and can reproduce it, consider submitting a pull request.

Note that connection recovery has been simplified just a few days ago in #656, too.

@Lashas83
Copy link

Lashas83 commented Nov 22, 2019

Seems that neither old, neither new version of connection recovery actually solves the problem if exceptions occurs while trying to recover topology. This only logs an exception, but does nothing to recover the consumer. But if client tries to reconnect once again - then recoverry will be triggered once again.

Probably the simplest way to reproduce is to have cluster with durable, but not HA 2 queues, where each of queue would live on different node. If we stop one of the node - client will reconnect to other node, but will not be able to start consuming messages from the queue, which has no master node right now, and will throw an exception. Expected behavior would be that consumer will try to recover it forever (like with connection recovery). In this case - until node, which owns that queue would come back online.

What this means - that we cannot trust topology recovery, and instead we need to implement our own wrapper around it to recover it.

You can actually check the code:
https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/v5.1.1/projects/client/RabbitMQ.Client/src/client/impl/AutorecoveringConnection.cs#L892 -
https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/master/projects/client/RabbitMQ.Client/src/client/impl/AutorecoveringConnection.cs#L800
RecordedConsumer.Recover throws an exception - that consumer is not tried to recover once again -as the exception will be only logged, but nothing else will be done with it. And actually there is no way to intercept that exception and act on it.

@michaelklishin
Copy link
Member

This library cannot know how it should recover from topology recovery failures. It works very well for a pretty significant number of users. The docs do not promise that it will cover every case.

A contribution that makes it possible to react to topology exceptions would be considered.

@acogoluegnes
Copy link

Retry logic and filtering for recovery has been added in the Java client. Even though this is not a trivial task, a PR based on the Java implementation would be welcome.

@lukebakken lukebakken added this to the 7.0.0 milestone Feb 8, 2020
@lukebakken lukebakken added enhancement help wanted next-gen-todo If a rewrite happens, address this issue. labels Feb 8, 2020
@lukebakken lukebakken modified the milestones: 8.0.0, 7.0.0 Mar 8, 2022
@lukebakken lukebakken modified the milestones: 7.0.0, 6.5.0 Mar 14, 2023
@lukebakken
Copy link
Contributor

This will be addressed by #1312

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted next-gen-todo If a rewrite happens, address this issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants