-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support "set" metric type #183
Comments
with option 3, what would the consequence be of setting this interval to a
reaaaally long time?
…On Tue, Feb 5, 2019, 20:40 Bryan Larsen ***@***.***> wrote:
In statsd, the set type counts unique occurrences between flushes. This is
not supported in statsd_exporter because "between flushes" is pretty
meaningless in a pull system, especially if there are multiple servers
scraping the exporter.
Originally requested in #112
<#112>
@matthiasr <https://github.com/matthiasr> requested opening a new issue
if somebody had a good idea on how to implement Set. I wouldn't say that I
have a good idea, but I do have some thoughts.
Option 1: assume single scraping server. Not a great solution, but would
be sufficient for us, at least at the moment.
Option 2: create a statsd plugin that sends sets as gauges on flush.
Requires the use of the statsd daemon.
Option 3: add an option for the flush interval; create a ticker from it
that persists and resets the set counts every tick. If the option isn't set
it could have a default, or it could just mean the user doesn't need set
support.
#3 <#3> seems the best
option, but #2 <#2> is
definitely easier and good enough for us. So we'd love to help with #3
<#3>, but if there's no
interest we'd probably just go ahead and do #2
<#2> ourselves.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#183>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAICBgo4gKKp6zUuT9ze3wadg-JipTuiks5vKd4jgaJpZM4aj7jz>
.
|
Are you thinking of "reaaaally long time" being the default? At least in our case that wouldn't be an issue. To do the count you'd need a map or a hyperloglog or something to count the unique instances, and I suppose for some people that could grow excessively. Our set sizes are around 100. So a long flush time would just keep counting the same objects repeatedly so no unbounded growth. statsd uses a Set to count objects rather than a hyperloglog or anything fancy, so they're not worried about unbounded growth. |
Better answer: we won't have a count until after the first flush interval expires |
Hmmm, I think I get what sets do now. I'll try to explain it back to make sure: When sending
then the next time statsd flushes to graphite, it sends
again, then on the next flush it will send
|
|
From the implementation, I don't see a problem with option 3. I don't think we need hyperloglog or anything. A My concern is whether this will actually produce a metric that is useful to anyone. The obvious output would be a gauge with the set count, but then I wonder how one would actually use that. And if Prometheus scrapes less frequently than the scrape interval, or the alignments are odd, you'd completely lose the information about a flush interval. Would it make sense to also observe the count for each interval into a histogram? |
We're using strings for our sets. Looking at the statsd source code, it appears that they're storing all values as strings. As for your concern, isn't this the same issue that pretty much every gauge will have? A gauge is typically a continuous signal sampled at a regular period and pushed to statsd. This is then scraped at a different period. So if it's pushed more frequently than it's scraped, samples will be dropped. Annoying, but given that it's a continuous signal, there are infinite number of potential samples that we're necessarily dropping. In our case, the signal we're measuring with the set is also a continuous signal. It's the number of idle workers. Each worker periodically sends its ID to statsd while it's idle. So we have 3 periods we have to contend with! In our case, the worker reporting period is 10 seconds and the flush interval is 60 seconds, so we can tolerate up to 5 dropped packets. Our scrape interval is 30 seconds (the default; haven't found a reason to tune it yet). The measurement we care about is the minimum. We don't want to run out of available workers. So yes, if the scrape interval becomes a significant multiple of the flush interval than the dropped samples might be painful. However, I think this is less of a concern for sets than it would be for other gauge users. Most gauges are easier to sample more frequently than a set is. For example, one could sample a temperature ten times per second. So I think your concern is quite independent of sets. It'd probably be quite useful to add the ability to histogram gauges, along with sets as a specific type of gauge. But I don't that request belongs under this specific issue. |
Ok, that makes sense, and that's an interesting use case! Do you feel up to implementing this? As I said above, I think a simple map to hold the set will do to start with. To keep things simple for users I would turn it on and set a reasonable default on the flush interval – 1 minute maybe? If there are no recorded sets then there won't be anything to clean up so it won't use measurable resources. One thing to keep in mind is not to leak too many goroutines when reloading the configuration – one way to do that would be to trigger a last flush on reload and then tear down any flushing routines. (this is a suggestion – if you think there's a better way then to that!) |
That's a good question! I don't have anything beyond a superficial exposure to Go, but I don't expect that to be much of a stumbling block. And the time I have to do this sort of thing is fairly limited. As I said #2 would definitely be easier and sufficient for us, but I'd definitely be interested in doing #3 if you're willing to provide guidance. |
Absolutely! My Go is also … functional, at best, but we'll work through it 😄 open a PR early and I can give feedback. If you allow edits from maintainers I can also make changes to it directly, if opportune. |
In statsd, the set type counts unique occurrences between flushes. This is not supported in statsd_exporter because "between flushes" is pretty meaningless in a pull system, especially if there are multiple servers scraping the exporter.
Originally requested in #112
@matthiasr requested opening a new issue if somebody had a good idea on how to implement Set. I wouldn't say that I have a good idea, but I do have some thoughts.
Option 1: assume single scraping server. Not a great solution, but would be sufficient for us, at least at the moment.
Option 2: create a statsd plugin that sends sets as gauges on flush. Requires the use of the statsd daemon.
Option 3: add an option for the flush interval; create a ticker from it that persists and resets the set counts every tick. If the option isn't set it could have a default, or it could just mean the user doesn't need set support.
#3 seems the best option, but #2 is definitely easier and good enough for us. So we'd love to help with #3, but if there's no interest we'd probably just go ahead and do #2 ourselves.
The text was updated successfully, but these errors were encountered: