Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reconnect_backoff_max_ms does not work in case Kafka cluster drops #1945

Closed
georgehdd opened this issue Nov 5, 2019 · 3 comments
Closed

Comments

@georgehdd
Copy link

georgehdd commented Nov 5, 2019

When the Kafka cluster goes down, the KafkaClient will try to reconnect forever and will not fail after the reconnect_backoff_max_ms have elapsed.

Reproduce: Run the following code while Kafka is available:

from kafka import KafkaConsumer
consumer = KafkaConsumer("some_topic", bootstrap_servers="kafkabroker:9092")
for message in consumer:
    print("Received Message")

Then while it's running, take Kafka down.

Expected behavior: After reconnect_backoff_max_ms have elapsed, the client should fail / an exception should be thrown.

Actual behavior: for message in consumer: is stuck infinitely

@dpkp
Copy link
Owner

dpkp commented Dec 29, 2019

Apologies if the documentation is confusing. This is the expected behavior. The purpose of this maximum is to put some reasonable cap on the reconnect time so that once a cluster recovers, clients will be expected to initiate a reconnect within this maximum time. Otherwise, if we allowed exponential backoff with no maximum, a cluster could recover and a reconnect timeout may have grown to multiple days or weeks -- meaning the client would stay disconnected despite cluster recovery.

       reconnect_backoff_max_ms (int): The maximum amount of time in
            milliseconds to wait when reconnecting to a broker that has
            repeatedly failed to connect. If provided, the backoff per host
            will increase exponentially for each consecutive connection
            failure, up to this maximum. To avoid connection storms, a
            randomization factor of 0.2 will be applied to the backoff
            resulting in a random range between 20% below and 20% above
            the computed value. Default: 1000.

@georgehdd
Copy link
Author

But why is there no notification to the application using the SDK that it's stuck in a reconnect loop?

@deepaktammali
Copy link

Is there a way to get notified or identify when the connection breaks and the client keeps infinitely reconnecting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants