Make prometheus serializer update timestamps and expiration time as new data arrives #9139

jakemcc · 2021-04-16T19:24:36Z

Updated associated README.md.
Wrote appropriate unit tests.

resolves #8170

Replaces #8257

#8257 partially implements a fix for #8170 but this PR expands it to also resolve the expiration issue to the Summary type.

It also updates both the Histogram and Summary timestampMs field when new data arrives for a particular Histogram or Summary. I talk about this problem in this comment on #8170. Below expands on it.

If you have configured [[outputs.prometheus_client]] to export_timestamps = true then without the updating of timestampMs on new data coming in you end up in a situation like below.

Say you are pushing prometheus data to telegraf. The first batch might look something like below with a timestamp of 1617722541063

metric_name{application="the-application",quantile="0.5"} 1 1617722540000
metric_name{application="the-application",quantile="0.75"} 2 1617722540000
metric_name{application="the-application",quantile="0.95"} 3 1617722540000
metric_name{application="the-application",quantile="0.98"} 3.3 1617722540000
metric_name{application="the-application",quantile="0.99"} 3.4 1617722540000
metric_name{application="the-application",quantile="0.999"} 10 1617722540000
metric_name_sum{application="the-application"} 207.501625198 1617722540000
metric_name_count{application="the-application"} 18000 1617722540000

About 10 seconds later, you publish an update to telegraf.

metric_name{application="the-application",quantile="0.5"} 0.9  1617722550000
metric_name{application="the-application",quantile="0.75"} 1.5 1617722550000
metric_name{application="the-application",quantile="0.95"} 2 1617722550000
metric_name{application="the-application",quantile="0.98"} 3 1617722550000
metric_name{application="the-application",quantile="0.99"} 3.5 1617722550000
metric_name{application="the-application",quantile="0.999"} 10 1617722550000
metric_name_sum{application="the-application"} 310 1617722550000
metric_name_count{application="the-application"} 18281 1617722550000

If telegraf is then scrapped by prometheus and export_timestamp = true, the initial first received timestamp is exported instead of the correct second timestamp. Updating timestampMs as new data comes in resolves this issue.

This does potentially **never** expire old data. If an old quantile or bucket is never updated it will never be expired if new data comes in. Potentially requiring a restart of telegraf to get rid of it

telegraf-tiger

🤝 ✅ CLA has been signed. Thank you!

ivorybilled

Makes sense to me

reimda

It makes sense to me for expiration to be relative to to the last time a histogram or summary is gathered instead of the first time. Thanks @jakemcc for adding comprehensive tests for this.

plugins/serializers/prometheus/collection_test.go

reimda · 2021-05-04T22:06:21Z

Hey @ssoroka, you commented on #8170 a few months ago, would you like to review this fix?

jakemcc · 2021-06-08T19:58:40Z

We've been running with a fork that includes these changes since a little before I submitted this PR. It has been working well for us and, as expected, has resolved the gaps mentioned in the referenced issues.

telegraf-tiger · 2021-06-08T20:28:20Z

Looks like new artifacts were built from this PR. Get them here!

Artifact URLs

jjh74 · 2021-09-02T06:57:48Z

@reimda / @ssoroka Any updates on this ?
I've been running a fork with subset/previous version of fix(#8257) for nearly a year. Without this PR prometheus histograms are not useful.

Jake McCrary added 3 commits April 15, 2021 17:42

Update Histogram and Summary timestamps when new data is collected

5404bb7

Stop expiring Summaries and Histograms receiving updates

319e6e3

This does potentially **never** expire old data. If an old quantile or bucket is never updated it will never be expired if new data comes in. Potentially requiring a restart of telegraf to get rid of it

Update readme to mention behavior of expiration times

a8563ce

telegraf-tiger bot added the fix pr to fix corresponding bug label Apr 16, 2021

telegraf-tiger bot approved these changes Apr 16, 2021

View reviewed changes

ivorybilled approved these changes Apr 19, 2021

View reviewed changes

reimda suggested changes May 4, 2021

View reviewed changes

plugins/serializers/prometheus/collection_test.go Outdated Show resolved Hide resolved

jjh74 mentioned this pull request May 19, 2021

Don't remove prometheus histograms on expire_interval #8257

Closed

3 tasks

timestamps now and addtime are after metrics' timestamps

b29e41a

jakemcc requested a review from reimda June 10, 2021 22:19

reimda approved these changes Sep 2, 2021

View reviewed changes

reimda merged commit 514a942 into influxdata:master Sep 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make prometheus serializer update timestamps and expiration time as new data arrives #9139

Make prometheus serializer update timestamps and expiration time as new data arrives #9139

jakemcc commented Apr 16, 2021

telegraf-tiger bot left a comment

ivorybilled left a comment

reimda left a comment

reimda commented May 4, 2021

jakemcc commented Jun 8, 2021

telegraf-tiger bot commented Jun 8, 2021

Artifact URLs

jjh74 commented Sep 2, 2021

Make prometheus serializer update timestamps and expiration time as new data arrives #9139

Make prometheus serializer update timestamps and expiration time as new data arrives #9139

Conversation

jakemcc commented Apr 16, 2021

telegraf-tiger bot left a comment

Choose a reason for hiding this comment

ivorybilled left a comment

Choose a reason for hiding this comment

reimda left a comment

Choose a reason for hiding this comment

reimda commented May 4, 2021

jakemcc commented Jun 8, 2021

telegraf-tiger bot commented Jun 8, 2021

Artifact URLs

jjh74 commented Sep 2, 2021