Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow setting granularity for summary metrics #290

Merged
merged 2 commits into from
Feb 20, 2020

Conversation

tgummerer
Copy link
Contributor

The Go client for prometheus aggregates summary metrics over 10
minutes by default, in 5 buckets. This is not always the behaviour we
want.

Allow tweaking those settings in statsd_exporter, so we can
aggregate summary metrics over more or less time, with more or fewer
buckets, and set the cap for the bucket as well.

/cc @matthiasr

@matthiasr
Copy link
Contributor

Can you explain more why you need to tweak this?

In general, I would rather encourage using histograms than add more features for summaries. With histograms, these decisions can be made at query time, with more options for aggregation too.

@tgummerer
Copy link
Contributor Author

Sure. This is for a metric for the response size of a request, where we sometimes get a very high response size for a few requests, and then a smaller one for others. Sometimes there can be enough requests with a large enough size that they dominate the graph for the whole 10 minutes that the max age is set to by default, which is not what we want.

The summary metrics instead of histograms is necessary because the values don't fit into histogram buckets nicely because there is quite a big spread from the smallest to the biggest values.

@matthiasr
Copy link
Contributor

You can add more/larger histogram buckets … but I see your point.

I'm okay with adding this, but I would like to make it clearer in the mapper config that these settings are only relevant to summaries. What do you think about moving them into a summary_config sub-structure? I would also like to use more descriptive keys in the YAML form.

What happens if they are not set?

Can you please add tests that include these settings? Just that they get parsed from the mapper correctly, even in mixed configurations.

@tgummerer
Copy link
Contributor Author

What do you think about moving them into a summary_config sub-structure? I would also like to use more descriptive keys in the YAML form.

I'm fine with moving them into a summary_config sub-structure, but didn't do that for now because the other settings such as quantiles that also only apply to summaries are not in a sub-structure either, so it would feel a bit odd having the new settings there. The buckets setting is similar for histograms.

But if moving them into a summary_config substructure is your preference here I don't mind. If we are introducing that should we allow the quantiles to be set inside that substructure as well?

What happens if they are not set?

If they are not set, they will be assigned the default settings here, which matches what happens currently.

Can you please add tests that include these settings? Just that they get parsed from the mapper correctly, even in mixed configurations.

Sure, I'll try to add some tests.

@matthiasr
Copy link
Contributor

ah good point about the existing quantiles setting … now I'm thinking about consistency and moving those into sub structures too … the options I see are

  1. keep everything flat
    1. with the current names
    2. add summary_ prefix to the new settings
    3. summary_ and histogram_ prefix to new and existing settings (breaking change?)
  2. move settings into substructure (breaking change?)
    1. only for summaries, keep "buckets" flat
    2. for summaries and quantiles

I am willing to break compatibility, but I don't want to do that repeatedly, so I would either keep compatibility (1.i,1.ii) or go all the way and restructure these settings all at once (2.ii). In the latter case, we could also use the presence of summary/histogram settings to decide which one the user prefers. And if we're breaking that we may as well make histograms the default …

As a user, how would you want the mapping to look?
How difficult would it be to maintain compatibility with older configs (translate into the new struct automatically, emit a warning)?
How much work are you willing to put into this?

@tgummerer
Copy link
Contributor Author

Thanks for writing this down. I can't think of any other options either.

As a user, how would you want the mapping to look?

I think I would prefer either option 1.i (though I'm having a hard time naming the settings tbh), and rely on the timer_type to tell me what the options mean, or go straight to option 2.ii. If we stick to option 1.i, we could emit warnings if options that don't match the timer_type are given to make things safer?

How difficult would it be to maintain compatibility with older configs (translate into the new struct automatically, emit a warning)?

Not entirely sure, but I don't think it would be too much work. I'm happy to give that a try if that's what we decide we want to go for. I think compatibility here is somewhat important, possibly more than whether histograms or summary is the default type, as I can't see a good transition that doesn't involve updating the config and the binary at the same type for this change (but I think it can be done for histogram vs summary switchovers).

How much work are you willing to put into this?

Some hours I think. I would like to do the right thing here, that we and the users are going to be happy about.

@matthiasr
Copy link
Contributor

is updating the config and binary at the same time a problem? I was working under the assumption that that's the easy way 😅

@tgummerer
Copy link
Contributor Author

I think it might be tricky in some cases, especially if there are multiple teams involved that might have their own metrics. I think for our usecase it might actually be possible, though not trivial. Not sure how others manage the config/binary though.

@tgummerer
Copy link
Contributor Author

I implemented option 2.ii now, making it backwards compatible as well, with a deprecation warning. I started off implementing option 1.i, but then I tried implementing warnings for using the wrong options for the wrong timer_type, and it seemed to get hard to explain to users. So I think separate hierarchies might actually be best.

The code could still use some cleanup, and I didn't exactly make this a nice series of commits. Also the docs still need to be updated for the new hierarchies. I'm happy to do all that, but would first like to get some feedback on this approach. If you're happy with this option I'll clean it up a bit, including the docs.

@matthiasr
Copy link
Contributor

matthiasr commented Jan 30, 2020 via email

@tgummerer
Copy link
Contributor Author

@matthiasr friendly ping? I updated the docs as well now, and I'm happy to turn this into a nicer series of commits if you prefer that.

@matthiasr
Copy link
Contributor

I'm sorry, work was a bit much this week and I didn't have enough focus to review this. I didn't forget about it!

@tgummerer
Copy link
Contributor Author

No worries, totally understand that. I just wanted to make sure it's not forgotten, but it's no problem if you need to hold of on taking a look at this for a bit longer 💖

Copy link
Contributor

@matthiasr matthiasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Only one thing that I would like to change: I would rather have a clean break in the mapper library interface than the ambiguity of mapping.Quantiles and mapping.SummaryOptions.Quantiles both being maybe-present (same for buckets).

To achieve that clean break (while keeping the configuration compatible), I would

  • rename the Buckets and Quantiles fields in the Mapping type (LegacyX or something) to make compilation fail for library users that rely on them
  • copy from these legacy fields into SummaryOptions/HistogramOptions rather than out
  • always return mappings with SummaryOptions/HistogramOptions non-nil (does it even have to be a pointer?) so that it's safe to use mapping.SummaryOptions.Quantiles

That way, the interface is unambiguous (don't use the legacy fields, use the Options) and there is a clear migration path for library users (s/\.Quantiles/.SummaryOptions&/g;s/\.Buckets/.Histogramoptions&/g). The configuration would still be compatible because the YAML field name doesn't need to change.

Currently the Buckets and Quantiles settings are top level settings per metric
in the yaml.  In a subsequent commit we're going to allow adding more such
options for summaries, at which point having them all at the top level gets
confusing.

Split the options out into separate hierarchies to allow adding more options,
without adding confusion.  We preserve backwards compatibility by still
accepting the old option, but warning when it is present.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
The Go client for prometheus aggregates summary metrics over 10
minutes by default, in 5 buckets.  This is not always the behaviour we
want.

Allow tweaking those settings in `statsd_exporter`, so we can
aggregate summary metrics over more or less time, with more or fewer
buckets, and set the cap for the bucket as well.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
@tgummerer tgummerer force-pushed the tg/allow-setting-max-age branch from 15096ed to dae5d78 Compare February 18, 2020 18:06
@tgummerer
Copy link
Contributor Author

Thanks for the review! I updated things according to your suggestions, so I think this is ready for another round of reviews. I also rebased the changes to make two nicer commits, and added my sign-off.

always return mappings with SummaryOptions/HistogramOptions non-nil (does it even have to be a pointer?) so that it's safe to use mapping.SummaryOptions.Quantiles

I do think it has to be a pointer, because that's how we detect it not being set to emit warnings, etc. For now it is only safe to use them when the timer type corresponds to the options type, but I can always make them non-nil if you prefer.

Copy link
Contributor

@matthiasr matthiasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@matthiasr matthiasr merged commit b4e9e95 into prometheus:master Feb 20, 2020
@tgummerer
Copy link
Contributor Author

Thanks for the reviews ✨

@tgummerer tgummerer deleted the tg/allow-setting-max-age branch February 20, 2020 13:19
matthiasr pushed a commit that referenced this pull request Feb 20, 2020
Signed-off-by: Matthias Rampke <mr@soundcloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants