-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception during recovery causes recovery failure #658
Comments
We cannot draw any conclusions with a single stack trace. Please collect and share server logs and a traffic capture. If you believe you have a decent understanding of the problem and can reproduce it, consider submitting a pull request. Note that connection recovery has been simplified just a few days ago in #656, too. |
Seems that neither old, neither new version of connection recovery actually solves the problem if exceptions occurs while trying to recover topology. This only logs an exception, but does nothing to recover the consumer. But if client tries to reconnect once again - then recoverry will be triggered once again. Probably the simplest way to reproduce is to have cluster with durable, but not HA 2 queues, where each of queue would live on different node. If we stop one of the node - client will reconnect to other node, but will not be able to start consuming messages from the queue, which has no master node right now, and will throw an exception. Expected behavior would be that consumer will try to recover it forever (like with connection recovery). In this case - until node, which owns that queue would come back online. What this means - that we cannot trust topology recovery, and instead we need to implement our own wrapper around it to recover it. You can actually check the code: |
This library cannot know how it should recover from topology recovery failures. It works very well for a pretty significant number of users. The docs do not promise that it will cover every case. A contribution that makes it possible to react to topology exceptions would be considered. |
Retry logic and filtering for recovery has been added in the Java client. Even though this is not a trivial task, a PR based on the Java implementation would be welcome. |
This will be addressed by #1312 |
I've run into an issue where an exception happens after recovery which causes the connection to be closed and no attempt to recover.
As far as I can tell from logs this is what's happening
RecoverySucceeded
event raisedModelShutdown
event raised with close reason 530 NOT_ALLOWEDConnectionShutdown
event is not raisedFrom a quick look over the code it seems like the problem is here:
rabbitmq-dotnet-client/projects/client/RabbitMQ.Client/src/client/impl/Connection.cs
Lines 575 to 586 in 015c517
There is already a close reason set but there is another exception happening
Here is the full exception taken from
m_shutdownReport
The text was updated successfully, but these errors were encountered: