-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why default to summary rather than histogram? #460
Comments
Probably historical. A number of Prometheus things from the very early days defaulted to Summary. |
Hey, that's a good question. Histograms in prometheus have a few main disadvantages that prevent them from being useful for statsd by default. The first and biggest downside is that histograms require some knowledge of what's being measured and the expected distribution in order to set decent bucket boundaries. Imagine you have timings that are expected to measure around a few milliseconds, and another set of timings that cluster around a few seconds. With a generic histogram using the default buckets, neither of these sets of timings would produce accurate data in default histograms. However, if you know these distributions, you can create buckets that will allow you to get meaningful percentiles. Second, histograms have a higher cardinality than summaries, especially if you try to measure something with a wide distribution of values. Given we don't know what kind of timings people will send, in order to have meaningful histograms by default in the statsd exporter, we'd need a very wide set of buckets. This causes more load on prometheus. Finally, summaries are accurate and produce meaningful data out of the box for any timing*, regardless of the distribution, since they directly calculate percentiles. Histograms use a linear estimation between bucket boundaries to get a percentile value, which inherently has error baked in that some people don't necessarily consider. This may change once prometheus supports sparse histograms, which significantly improve on these limitations.
TL;DR Summaries are cheaper and more accurate for unknown distributions than histograms, which currently require some knowledge of the expected distribution. |
The big down side of Summaries is that they can't be aggregated. If you have more than one statsd_exporter receiving data from the same app(s). The data will be essentially useless. |
I thought about changing the default in the past but never tackled that. With Histograms v2 in the works, I would rather not change the default now – they will alleviate a lot of the "must pick buckets" pain, and if we can make one breaking change rather than multiple all the better. |
Now that native histograms are more stable I would +1 here to make this default in the next major release. I have been using it as default and can just recommend the level of detail you get is impressive. |
They're "more stable" but still experimental 😅 We still need a text format (prometheus/proposals#32), and it's behind a feature flag in Prometheus itself. Let's wait until it is really stable 😉 |
With Prometheus 3.0 just around the corner, let's do this 😄 What should the default configuration be? Should we still include some default classic histogram buckets, or only an exponential histogram factor? |
I would leave the default standard timer bucket set. |
I would also be fine with keeping buckets, specially with the work around Native Histogram with Custom Buckets (NHCB) going on. Regarding the native histogram configuration, I would just make sure we set the max buckets option. Have seen that showing up on profiles when the number is too permissive (anything over 500). |
Oh yes, very good point. OTel has set a precedent with [160 buckets by
default](
https://opentelemetry.io/docs/specs/otel/metrics/sdk/#base2-exponential-bucket-histogram-aggregation),
so unless there's a strong reason for another specific number I would stick
with that.
…On Fri, 8 Nov 2024, 17:45 Pedro Tanaka, ***@***.***> wrote:
I would also be fine with keeping buckets, specially with the work around
Native Histogram with Custom Buckets (NHCB) going on.
Regarding the native histogram configuration, I would just make sure we
set the max buckets option. Have seen that showing up on profiles when the
number is too permissive (anything over 500).
—
Reply to this email directly, view it on GitHub
<#460 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABAEBU3KCSVEBJEW3YR2D3Z7TTCLAVCNFSM6AAAAABRNVJ7J2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRVGI2DQOJUGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
What is the reason behind converting the metric to a summary rather than a histogram by default?
The text was updated successfully, but these errors were encountered: