-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(parsers.avro): Add Apache Avro parser plugin #11816
Conversation
bc1c25a
to
3ae9134
Compare
b6222b0
to
f6948dc
Compare
5ebe86e
to
abb157b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @athornton for reviving this parser! I have some comments, nothing too big. The only part that concerns me is the addition of time-formats. Please avoid adding those and rather add a round_timestamps_to
option in your parser, avoiding the combinatorics of formats and rounding.
@athornton please also rebase to latest master to get CircleCI back functional. |
abb157b
to
b1ae8be
Compare
Thank you for the detailed review. I'll get to work on it. I have no objection to a |
@srebhan : OK, I think I see conceptually what you're saying: all the convenience tools where I take the JSON representations of the schema and the message, and then call jsonToAvroMessage to generate the Avro format input, should be replaced by a binary input message (the output of jsonToAvroMessage) and a simple test of whether that works? Although I could put the schema or even both the schema and the message into telegraf.conf, in actual use the schema will be externally-given (almost always, it will come from a schema registry), and obviously the message is coming in over the wire. So it feels like we want a much simpler test, of "Avro format binary data" as the input...but if we want more test cases at some point, I don't want to throw away the tooling to create those messages, because in practice, generating the test data will be done by matching a schema and message and generating the Avro data from them, rather than generating the wire protocol by hand. Where should that tooling go? |
Exactly. Put the binary messages there and let the file input read them.
I don't think that tool should be in Telegraf. It's not Telegraf's task to create those messages. If we add further test-cases it will likely be based on bug-reports, so it would be nice if the parser could print the binary message it receives and maybe even the schema as debug messages on error. We have added this for a few other plugins, e.g. GNMI one to be able to reproduce problems in tests... |
OK. That's the approach I'll take, then. I'll make my own little tools repository to assemble the messages to binary format and put those in testcases. Something else I thought of and will probably add: since I allow the user, in telegraf.conf, to either specify the schema directly as a string, or as a schema registry endpoint, it's probably worth documenting that there's no reason the endpoint can't be a file:/// url if the user has an external schema file rather than an Avro schema registry. |
Hmm. It's not quite that simple: messages may arrive as raw Avro binary data, as Avro single-object-encoding data, or as Confluent wire format. So the parser will work on binary data, and if a parser registry is specified it will expect Confluent format. So no explanatory comment yet. However, I think I have the test suite rewritten now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome update @athornton! Just a few very small comments and then we are good to go I think...
Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com> Apply review suggestions Update plugins/parsers/avro/parser_test.go Fail immediately if config or Init() error. Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
@srebhan I think we're ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for driving this PR @athornton! Good job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@athornton - huge thank you for your persistence on this PR. I have some questions in line.
@athornton thanks for the updates, I think we are down to two open questions:
Thanks! |
So, I think we may be done? The schema lookup (when you have a schema registry) has to be done at each Parse(), but after the initial retrieval it's just a map lookup, so shouldn't be too costly. Using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for driving and contributing the new parser!
Required for all PRs
resolves #1630
This is a replacement for #7732 since the original author (@emanuele-falzone ) has gone silent.
This builds on Emanuele Falzone's work to allow ingestion from Avro serialized format. It can either connect to a schema registry or a schema can be specified in the parser.