Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow customization of allowed chars in graphite output #12832

Closed
maxbeutel opened this issue Mar 10, 2023 · 13 comments · Fixed by #12835 or #12908
Closed

Allow customization of allowed chars in graphite output #12832

maxbeutel opened this issue Mar 10, 2023 · 13 comments · Fixed by #12835 or #12908
Labels
feature request Requests for new plugin and for new features to existing plugins

Comments

@maxbeutel
Copy link
Contributor

maxbeutel commented Mar 10, 2023

Use Case

  • I would like to use telegraf to feed data into an internal system which uses the graphite protocol.
  • However, all metrics shipped to this system must have a prefix in the form of |my_prefix| (yep, that's right, it needs pipes.)

Expected behavior

Specify a list of (additional?) safe chars that don't get stripped when using Graphite output, such that a metric like

my.|prefix|.localhost.cpu0.us-west-2.cpu.usage_idle

Can be sent to the graphite socket (and | pipe chars don't get escaped)

Actual behavior

Currently pipes | get replaced with _ in the graphite output due to the regex

var strictAllowedChars          = regexp.MustCompile(`[^a-zA-Z0-9-:._=\p{L}]`)`

in plugins/serializers/graphite/graphite.go.

That is, a metric in the form of

my.|prefix|.localhost.cpu0.us-west-2.cpu.usage_idle

will be rewritten to

my._prefix_.localhost.cpu0.us-west-2.cpu.usage_idle

Additional info

I'm OK to raise a Pull Request if there's a chance to get this merged.

Ideas:

  • Introduce an additional config option to make the list of safe chars configurable
  • OR: Introduce an additional config option to allow a list of safe chars that get appended into the regex pattern of strictAllowedChars used to escape the metrics
@maxbeutel maxbeutel added the feature request Requests for new plugin and for new features to existing plugins label Mar 10, 2023
@powersj
Copy link
Contributor

powersj commented Mar 10, 2023

Hi,

Thanks for the feature request. I received another request through our support channels for this same thing :) so I was looking at it recently.

I think adding a new option for the non-tag enabled path is the right path. My hesitation thus far has been around trying to understand why we are sanitizing characters in the first place and if there was a published spec about valid characters. The only thing I have seen was around Graphite tag name and values, not the metric path.

Today the non-tag path, uses the strict sanitation function, which will:

  1. Replace /, @, and * with _
  2. Replace .. with .
  3. Remove \
  4. Replace any character not matching [^a-zA-Z0-9-:._=\p{L}]

Character could still get sanitized by the first few replace/removes even with a custom setting, although I think that might be ok. Is that your understanding as well?

If so, I think a new graphite_strict_sanitize_regex option that defaults to the current value is something that could be helpful. Again noting that some characters, may still get removed or replaced.

What do you think?

@srebhan
Copy link
Member

srebhan commented Mar 10, 2023

@powersj reading deeper into the graphite issue I finally found the definition in the code suggesting that all printable characters, i.e.

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~

are allowed for the metric name while the following characters need to be escaped

(){},.'"\|

@maxbeutel
Copy link
Contributor Author

Yeah I think there should be a distinction between

  1. Cleaning tags, which seem to have a relatively small list of forbidden values, from https://graphite.readthedocs.io/en/latest/tags.html#carbon: "Tag names must have a length >= 1 and may contain any ascii characters except ;!^=" (which would mean to me using these chars in a tag name would never work.)
  2. Cleaning the metric, where @srebhan seems to have found the reference.

Regarding (2), based on sending metrics to graphite in our place, I was able to send data using nc in the form of foo.|bar|.baz 123 without escaping the | at all.

At the moment the full metric string is passed througth strictSanitize, which is indeed quite strict in sanitizing (as described above) and not customizable.

So for me it'd make sense if there's an option to customize the sanitizing/escaping, but have a sane default value.

powersj added a commit to powersj/telegraf that referenced this issue Mar 10, 2023
The graphite serializer is very strict as to what characters are allowed
to pass through. This allows the user to override the regex used and
specify their own.

fixes: influxdata#12832
@powersj
Copy link
Contributor

powersj commented Mar 10, 2023

@maxbeutel in 20-30mins there will be artifacts, which you can use to test the new behavior in #12835. I think @srebhan and I would still need to talk through if this is the best way internally to make the change, but the PR exposes the new graphite_strict_sanitize_regex option which you can try out.

Thanks!

@maxbeutel
Copy link
Contributor Author

maxbeutel commented Mar 10, 2023

Thanks for the quick response!

Unfortunately (or, lucky for me) I'm travelling for a while now so I can only test it out after that.
But based on the unit test it looks like this will fix the issue indeed, the generated metric looks good.

@srebhan
Copy link
Member

srebhan commented Mar 13, 2023

@powersj do we really need to be able to customize the regexp? How about having a "strict", "standard" and "none" setting for customization?

@powersj
Copy link
Contributor

powersj commented Mar 13, 2023

@srebhan that is a route I thought about as well. A couple thoughts:

  1. I do not think "none" should ever be an option. There are still some characters like the period we should remove or slash we should escape in the metric path
  2. If we offer a "standard", I feel like we will have this conversation again at a later date, when a user finds an exception.
  3. By giving the user the regex, it is in their hands to do what they want and we can avoid any odd graphite format or differences between the various parsers that exist.

What do you think?

@maxbeutel
Copy link
Contributor Author

@powersj @srebhan can we consider to reopen this? The current implementation doesn't actually work using the following toml config:

[[outputs.graphite]]
  servers = ["localhost:2003"]
  prefix = "test.|foobar|.me"
  graphite_tag_support = true
  graphite_strict_sanitize_regex = '...'

Because in https://github.com/influxdata/telegraf/blob/master/plugins/outputs/graphite/graphite.go#L168 the config option for the regex isn't passed (the third argument, the empty string, should be the sanitize regex option.)

s, err := serializers.NewGraphiteSerializer(g.Prefix, g.Template, "", g.GraphiteTagSupport, g.GraphiteTagSanitizeMode, g.GraphiteSeparator, g.Templates)

The fix would be to introduce the config option in type Graphite like so

GraphiteStrictRegex string toml:"graphite_strict_sanitize_regex"

and then pass it instead of the empty string.

@powersj powersj reopened this Mar 20, 2023
powersj added a commit to powersj/telegraf that referenced this issue Mar 20, 2023
Missed adding this correctly to the graphite output, only the parser in
a previous PR.

fixes: influxdata#12832
powersj added a commit to powersj/telegraf that referenced this issue Mar 20, 2023
Missed adding this correctly to the graphite output, only the parser in
a previous PR.

fixes: influxdata#12832
@powersj
Copy link
Contributor

powersj commented Mar 20, 2023

Ah right, I added it to the parser not the output. Can you try out the artifacts in PR #12908 in 20-30mins?

Thanks!

@powersj powersj added the waiting for response waiting for response from contributor label Mar 20, 2023
@maxbeutel
Copy link
Contributor Author

maxbeutel commented Mar 21, 2023

Looks good now, I tested it by adding | to the existing strict regex list (so basically allowing |) and now the metrics are pushed keeping | in the graphite output format. Thank you!

Edit: Will this be part of the 1.27.0 release? Is that due this week?

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 21, 2023
@powersj
Copy link
Contributor

powersj commented Mar 21, 2023

Edit: Will this be part of the 1.27.0 release? Is that due this week?

We will include this and the other PR in v1.26.1 on or around March 3

@maxbeutel
Copy link
Contributor Author

I guess you mean April 3? : )

Thank you again for implementing this option, helps us a lot.

@powersj
Copy link
Contributor

powersj commented Mar 21, 2023

I guess you mean April 3? : )

lol - yes, where did this month go?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins
Projects
None yet
3 participants