-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix topic error parsing in MetadataResponse #1997
Fix topic error parsing in MetadataResponse #1997
Conversation
Thanks for this. I remember this being tricky because there's both top-level error codes and per-topic error codes. I'll need to dig into the protocol docs before pulling this in to doublecheck some things. Also, I kinda wonder if we should be checking the type of request/response rather than just looking for the error code... ie, this hack scales a bit but as more request/response pairs come we may need to switch to another model such as a callback where the request/response pair itself maintains the code check of whether or not there's a failure. But in the meantime, this may be a simple hack for another little bit, I just want to doublecheck first. |
@jeffwidman I agree, definitely worth a double-check and I like your idea about relying on the response type rather than If we're not sure about the semantics of the error codes in Before: topic_error_tuples = (response.topic_errors if hasattr(response, 'topic_errors')
else response.topic_error_codes) After: topic_error_tuples = []
if hasattr(response, 'topic_errors'):
topic_error_tuples.extend(response.topic_errors)
elif hasattr(response, 'topic_error_codes'):
topic_error_tuples.extend(response.topic_error_codes) Just depends if you're more interested in a short-term bug fix or would rather dig into this more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked into this a bit more and this is good for now.
It's less brittle in that now it allows responses that use other fields for error codes without blowing up, although as a con it no longer blows up so this may not get patched to watch errors on the new fields (IE, not sure this PR would have happened if it was elif
rather than else
).
This whole thing is a bit hacky, it'd be better if the caller exposed a callback or something that _send_to_controller()
could use to determine if a NotControllerError
was thrown to refresh. That way we keep the error-parsing logic bundled together with the specific protocol parsing method rather than jumbled around. But that'd be a fairly large refactor so that can wait til someone has the time/inclination. Maybe when that happens this can have unit tests to help verify the request/response parsing.
So merging.
Thank you again!
I couldn't figure out why this wasn't reported previously, until I realized that there are two paths for sending metadata request/responses. Normally we send to the least_loaded_node which has no error checking. But for the recently added Given that, I'd probably prefer to also check the protocol family / error code pairing as the number of pairs should be relatively small. And then blow up if it's not one of the expected pairs. That way we keep our error surface relatively constrained. Will leave for future enhancement if anyone is interested. |
Actually, thinking about this one more time... I wonder if Metadata response ever even throws the If it doesn't, then Need to also investigate what the expected failure is if So this may need a bit more looking at. |
I dug a bit further and realized that Java changed to querying the least-loaded-node rather than the controller for |
It looks like
AdminClient.describe_topics
is failing because of how we're parsing the response for topic errors:kafka-python/kafka/admin/client.py
Lines 376 to 383 in bbb8c90
Here's an example of what happens when I run
describe_topics
before the change:And after the change:
This change is