feat(parsers.avro): Add Apache Avro parser plugin #11816

athornton · 2022-09-15T17:24:11Z

Required for all PRs

Updated associated README.md.
Wrote appropriate unit tests.
Pull request title or commits are in conventional commit format

resolves #1630

This is a replacement for #7732 since the original author (@emanuele-falzone ) has gone silent.

This builds on Emanuele Falzone's work to allow ingestion from Avro serialized format. It can either connect to a schema registry or a schema can be specified in the parser.

srebhan

Thanks @athornton for reviving this parser! I have some comments, nothing too big. The only part that concerns me is the addition of time-formats. Please avoid adding those and rather add a round_timestamps_to option in your parser, avoiding the combinatorics of formats and rounding.

internal/internal.go

plugins/parsers/avro/parser.go

plugins/parsers/avro/parser_test.go

config/config.go

plugins/parsers/registry.go

plugins/parsers/registry_test.go

srebhan · 2022-09-20T16:49:57Z

@athornton please also rebase to latest master to get CircleCI back functional.

athornton · 2022-09-20T20:36:29Z

Thank you for the detailed review. I'll get to work on it. I have no objection to a round_timestamps_to config item rather than my initial implementation (the reason to round at all basically comes down to https://docs.influxdata.com/influxdb/v2.4/reference/faq/#does-the-precision-of-the-timestamp-matter , and I at least find it much easier to eyeball the data if all digits past the precision we care about are zero rather than the deterministic-but-basically-random stuff that the conversion gives us).

athornton · 2023-02-27T18:07:49Z

@srebhan :

OK, I think I see conceptually what you're saying: all the convenience tools where I take the JSON representations of the schema and the message, and then call jsonToAvroMessage to generate the Avro format input, should be replaced by a binary input message (the output of jsonToAvroMessage) and a simple test of whether that works? Although I could put the schema or even both the schema and the message into telegraf.conf, in actual use the schema will be externally-given (almost always, it will come from a schema registry), and obviously the message is coming in over the wire.

So it feels like we want a much simpler test, of "Avro format binary data" as the input...but if we want more test cases at some point, I don't want to throw away the tooling to create those messages, because in practice, generating the test data will be done by matching a schema and message and generating the Avro data from them, rather than generating the wire protocol by hand. Where should that tooling go?

srebhan · 2023-02-28T14:35:53Z

OK, I think I see conceptually what you're saying: all the convenience tools where I take the JSON representations of the schema and the message, and then call jsonToAvroMessage to generate the Avro format input, should be replaced by a binary input message (the output of jsonToAvroMessage) and a simple test of whether that works? Although I could put the schema or even both the schema and the message into telegraf.conf, in actual use the schema will be externally-given (almost always, it will come from a schema registry), and obviously the message is coming in over the wire.

Exactly. Put the binary messages there and let the file input read them.

So it feels like we want a much simpler test, of "Avro format binary data" as the input...but if we want more test cases at some point, I don't want to throw away the tooling to create those messages, because in practice, generating the test data will be done by matching a schema and message and generating the Avro data from them, rather than generating the wire protocol by hand. Where should that tooling go?

I don't think that tool should be in Telegraf. It's not Telegraf's task to create those messages. If we add further test-cases it will likely be based on bug-reports, so it would be nice if the parser could print the binary message it receives and maybe even the schema as debug messages on error. We have added this for a few other plugins, e.g. GNMI one to be able to reproduce problems in tests...

athornton · 2023-02-28T16:00:10Z

OK. That's the approach I'll take, then. I'll make my own little tools repository to assemble the messages to binary format and put those in testcases. Something else I thought of and will probably add: since I allow the user, in telegraf.conf, to either specify the schema directly as a string, or as a schema registry endpoint, it's probably worth documenting that there's no reason the endpoint can't be a file:/// url if the user has an external schema file rather than an Avro schema registry.

athornton · 2023-03-01T02:24:37Z

Hmm. It's not quite that simple: messages may arrive as raw Avro binary data, as Avro single-object-encoding data, or as Confluent wire format. So the parser will work on binary data, and if a parser registry is specified it will expect Confluent format. So no explanatory comment yet.

However, I think I have the test suite rewritten now.

srebhan

Awesome update @athornton! Just a few very small comments and then we are good to go I think...

plugins/parsers/avro/parser.go

plugins/parsers/avro/parser_test.go

plugins/parsers/avro/schema_registry.go

Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com> Apply review suggestions Update plugins/parsers/avro/parser_test.go Fail immediately if config or Init() error. Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>

athornton · 2023-03-01T19:12:31Z

@srebhan I think we're ready.

srebhan

Thank you very much for driving this PR @athornton! Good job!

powersj

@athornton - huge thank you for your persistence on this PR. I have some questions in line.

plugins/parsers/avro/README.md

plugins/parsers/avro/parser.go

plugins/parsers/avro/README.md

plugins/parsers/avro/parser.go

powersj · 2023-03-02T16:59:06Z

@athornton thanks for the updates, I think we are down to two open questions:

The purpose of DefaultTags versus using the built-in method for defining tags for an input.
If getSchemaAndCodec shoudl be run on every Parse

Thanks!

athornton · 2023-03-02T17:47:43Z

So, I think we may be done? The schema lookup (when you have a schema registry) has to be done at each Parse(), but after the initial retrieval it's just a map lookup, so shouldn't be too costly. Using toml:"tags" for the default tags looks like it should work.

powersj

Thanks for driving and contributing the new parser!

telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Sep 15, 2022

athornton force-pushed the features/avro branch from bc1c25a to 3ae9134 Compare September 15, 2022 17:24

athornton mentioned this pull request Sep 15, 2022

Apache Avro parser plugin #7732

Closed

3 tasks

athornton force-pushed the features/avro branch 14 times, most recently from b6222b0 to f6948dc Compare September 18, 2022 19:26

athornton mentioned this pull request Sep 18, 2022

feat(inputs.kafka_consumer): Add regular expression support for topics #11831

Merged

3 tasks

athornton changed the title ~~feat(plugins/parser): add Apache Avro parsing~~ feat(plugins.parser): add Apache Avro parsing Sep 19, 2022

athornton force-pushed the features/avro branch 4 times, most recently from 5ebe86e to abb157b Compare September 20, 2022 15:56

srebhan reviewed Sep 20, 2022

View reviewed changes

srebhan self-assigned this Sep 20, 2022

srebhan added the plugin/parser 1. Request for new parser plugins 2. Issues/PRs that are related to parser plugins label Sep 20, 2022

srebhan changed the title ~~feat(plugins.parser): add Apache Avro parsing~~ feat(parsers.avro): Add Apache Avro parser plugin Sep 20, 2022

athornton force-pushed the features/avro branch from abb157b to b1ae8be Compare September 20, 2022 18:33

rewrite test suite

87e20a3

athornton force-pushed the features/avro branch from 18c8054 to 87e20a3 Compare March 1, 2023 03:11

srebhan reviewed Mar 1, 2023

View reviewed changes

athornton force-pushed the features/avro branch from 4721ab6 to 29424c8 Compare March 1, 2023 18:47

srebhan approved these changes Mar 1, 2023

View reviewed changes

srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Mar 1, 2023

srebhan assigned powersj and unassigned srebhan Mar 1, 2023

powersj reviewed Mar 1, 2023

View reviewed changes

Address review suggestions

a557cb7

athornton force-pushed the features/avro branch from 3e4aaf6 to a557cb7 Compare March 2, 2023 05:10

powersj reviewed Mar 2, 2023

View reviewed changes

plugins/parsers/avro/parser.go Outdated Show resolved Hide resolved

Fix time formatting

361cf4f

athornton force-pushed the features/avro branch from 20dddc7 to 361cf4f Compare March 2, 2023 16:16

athornton added 2 commits March 2, 2023 10:35

Further review suggestions

f26b988

WIP

b0975d7

athornton requested a review from powersj March 2, 2023 17:47

powersj approved these changes Mar 2, 2023

View reviewed changes

powersj merged commit acd1500 into influxdata:master Mar 2, 2023

athornton deleted the features/avro branch March 2, 2023 18:26

pranavnag18 mentioned this pull request Mar 28, 2023

Avro processor doesnt support Unions #12970

Closed

srebhan added this to the v1.26.0 milestone Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(parsers.avro): Add Apache Avro parser plugin #11816

feat(parsers.avro): Add Apache Avro parser plugin #11816

athornton commented Sep 15, 2022 •

edited

Loading

srebhan left a comment

srebhan commented Sep 20, 2022

athornton commented Sep 20, 2022

athornton commented Feb 27, 2023

srebhan commented Feb 28, 2023

athornton commented Feb 28, 2023

athornton commented Mar 1, 2023

srebhan left a comment

athornton commented Mar 1, 2023

srebhan left a comment

powersj left a comment

powersj commented Mar 2, 2023

athornton commented Mar 2, 2023 •

edited

Loading

powersj left a comment

feat(parsers.avro): Add Apache Avro parser plugin #11816

feat(parsers.avro): Add Apache Avro parser plugin #11816

Conversation

athornton commented Sep 15, 2022 • edited Loading

Required for all PRs

srebhan left a comment

Choose a reason for hiding this comment

srebhan commented Sep 20, 2022

athornton commented Sep 20, 2022

athornton commented Feb 27, 2023

srebhan commented Feb 28, 2023

athornton commented Feb 28, 2023

athornton commented Mar 1, 2023

srebhan left a comment

Choose a reason for hiding this comment

athornton commented Mar 1, 2023

srebhan left a comment

Choose a reason for hiding this comment

powersj left a comment

Choose a reason for hiding this comment

powersj commented Mar 2, 2023

athornton commented Mar 2, 2023 • edited Loading

powersj left a comment

Choose a reason for hiding this comment

athornton commented Sep 15, 2022 •

edited

Loading

athornton commented Mar 2, 2023 •

edited

Loading