Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make prometheus serializer update timestamps and expiration time as new data arrives #9139

Conversation

jakemcc
Copy link
Contributor

@jakemcc jakemcc commented Apr 16, 2021

  • Updated associated README.md.
  • Wrote appropriate unit tests.

resolves #8170

Replaces #8257

#8257 partially implements a fix for #8170 but this PR expands it to also resolve the expiration issue to the Summary type.

It also updates both the Histogram and Summary timestampMs field when new data arrives for a particular Histogram or Summary. I talk about this problem in this comment on #8170. Below expands on it.

If you have configured [[outputs.prometheus_client]] to export_timestamps = true then without the updating of timestampMs on new data coming in you end up in a situation like below.

Say you are pushing prometheus data to telegraf. The first batch might look something like below with a timestamp of 1617722541063

metric_name{application="the-application",quantile="0.5"} 1 1617722540000
metric_name{application="the-application",quantile="0.75"} 2 1617722540000
metric_name{application="the-application",quantile="0.95"} 3 1617722540000
metric_name{application="the-application",quantile="0.98"} 3.3 1617722540000
metric_name{application="the-application",quantile="0.99"} 3.4 1617722540000
metric_name{application="the-application",quantile="0.999"} 10 1617722540000
metric_name_sum{application="the-application"} 207.501625198 1617722540000
metric_name_count{application="the-application"} 18000 1617722540000

About 10 seconds later, you publish an update to telegraf.

metric_name{application="the-application",quantile="0.5"} 0.9  1617722550000
metric_name{application="the-application",quantile="0.75"} 1.5 1617722550000
metric_name{application="the-application",quantile="0.95"} 2 1617722550000
metric_name{application="the-application",quantile="0.98"} 3 1617722550000
metric_name{application="the-application",quantile="0.99"} 3.5 1617722550000
metric_name{application="the-application",quantile="0.999"} 10 1617722550000
metric_name_sum{application="the-application"} 310 1617722550000
metric_name_count{application="the-application"} 18281 1617722550000

If telegraf is then scrapped by prometheus and export_timestamp = true, the initial first received timestamp is exported instead of the correct second timestamp. Updating timestampMs as new data comes in resolves this issue.

Jake McCrary added 3 commits April 15, 2021 17:42
This does potentially **never** expire old data. If an old quantile or
bucket is never updated it will never be expired if new data
comes in. Potentially requiring a restart of telegraf to get rid of it
@telegraf-tiger telegraf-tiger bot added the fix pr to fix corresponding bug label Apr 16, 2021
Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤝 ✅ CLA has been signed. Thank you!

Copy link
Contributor

@ivorybilled ivorybilled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

Copy link
Contributor

@reimda reimda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me for expiration to be relative to to the last time a histogram or summary is gathered instead of the first time. Thanks @jakemcc for adding comprehensive tests for this.

@reimda
Copy link
Contributor

reimda commented May 4, 2021

Hey @ssoroka, you commented on #8170 a few months ago, would you like to review this fix?

@jakemcc
Copy link
Contributor Author

jakemcc commented Jun 8, 2021

We've been running with a fork that includes these changes since a little before I submitted this PR. It has been working well for us and, as expected, has resolved the gaps mentioned in the referenced issues.

@jakemcc jakemcc requested a review from reimda June 10, 2021 22:19
@jjh74
Copy link
Contributor

jjh74 commented Sep 2, 2021

@reimda / @ssoroka Any updates on this ?
I've been running a fork with subset/previous version of fix(#8257) for nearly a year. Without this PR prometheus histograms are not useful.

@reimda reimda merged commit 514a942 into influxdata:master Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

outputs.prometheus_client (metric_version=2) removes(still existing) histograms every expiration_interval
4 participants