-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent empty OTLP's Int/DoubleSum from being sent #1976
Comments
This sounds like a bug, but I'm not sure what the collector expects in these cases. @bogdandrutu do you know? |
@carlosalberto I'm not aware of anything in the collector that would make use of metricdata if there are zero datapoints. However I was wondering why is it generating empty metricdata instead of having a datapoint with value of zero? |
Currently our Sum aggregations are all cumulative, so a zero report would mean that the counter had been reset, which would not be correct. |
If it's a cumulative shouldn't a datapoint value always be reported each interval then? Even if the value is unchanged it should still be sent as datapoints can be dropped at the collector or further upstream. |
Yes, it is generally more robust to report something on a regular interval, even if it hasn't changed. It might not be as optimal but it will be much more robust for the backends that ultimately consume this data. If there are any delta values where it reports only the difference since the last report, it would be good to emit 0s as well. |
I see that we need to add some specification to clarify this matter. In the OTel-Go SDK the Processor component has two distinct behaviors:
Both of these mechanisms are needed for a Prometheus exporter that doesn't otherwise manage its own state. When we discuss what the behavior should be for OTLP export, both (1) and (2) are valid behaviors. open-telemetry/opentelemetry-specification#731 states that (1) should be on by default, but we haven't specified what to do about (2). I'm afraid this has to be tackled in the specification, but's tangential to the question here. One thing we can say, I think, is that we should never send empty data points. Either send the current value because it has changed or because of a memory configuration (2), or do not send a metric at all, but never send a metric without points. We expect to not Sum data points when reporting DELTAs and the value is zero, I would say that SDKs SHOULD not send 0-valued DELTAs, but this implicitly says something about memory behavior. We might simply state for the metrics SDK specification that memory behavior should be enabled when reporting CUMULATIVE temporality, but this is a stretch: the question is when should an OTLP exporter report old values for Gauges that were previously reported. Note that Gauge points do not have temporality, so the rule "memory when cumulative" does not literally apply. @carlosalberto would be happy for you to file a spec issue on this. 😀 |
Just to clarify: I am not aware of any reason ever to send an empty data point. @jkwatson I would not interpret an "empty" point as a timestamp reset. The StartTimeUnixNano field should be used for the start timestamp and is independent of whether there are points. There is a legitimate behavior, which I call "Memory", that tells the SDK to continue to export values for metrics that are not updated during an interval and remember them indefinitely. This behavior is generally paired with cumulative reporting, and it is required for Prometheus. When configuring the SDK for cumulative-reporting-with-memory, it should continue to report the last value (for gauges) and cumulative values (for counters and histograms) I would argue that this "Memory" behavior is a choice: whether to report no metric or to report an old value, but in either case we should not report an empty Metric. Although this is not written into the SDK specification (there should be a spec issue on this topic), but I'll point you to the optional behavior in OTel-Go, where it is built in to the basic Processor component: https://github.com/open-telemetry/opentelemetry-go/blob/9ac3a08eef14d96687edb260e2c27114a658d6b7/sdk/metric/processor/basic/config.go#L34 |
I wasn't saying that an empty point would work that way, but one with an explicit '0' sent as the point's value. |
BTW, I believe this "view" PR would let us decide to always send the last value recorded, for last-value aggregations, even if nothing had been reported in an interval, by specifying the "CUMULATIVE" temporality for whatever you need: #2037 (It wouldn't prevent empty records, but that's a separate issue) |
@jmacd Are you saying if a delta time series only reports 0 it would never send any datapoints? Or if it reports 10, then 0, then 20, it only reports the 10 and 20 but not the 0 between? |
@jrcamp I think it is not about reporting "0" it is about reporting or not anything at all, I think this is a sequence of events and what to export: -> t0 |
I agree with @bogdandrutu's example and would like us to specify this behavior. When there is a metric event, a value should be exported regardless of whether it matches the empty state. We will say SDKs MUST (or SHOULD) emit 0-valued DELTA-AggregationTemporality points and SHOULD/MUST NOT emit an empty state in these cases? I.e., in this sequence we must export delta "0":
|
If we specify that all delta-temporality aggregations must report "zero valued" values every report cycle if no recordings are made during that cycle, that will impact all exporters, not just OTLP. Are we ok with that? Also, this does mean that the SDK must hold on to all references to all instruments for the lifetime of the process. Is that ok? |
No, I don't think so. I believe the point being made is that zero-valued deltas SHOULD be reported IFF a recording is made. In open-telemetry/opentelemetry-specification#1198 I am trying to make this requirement clear: if "no recording is made" after some time the Accumulator SHOULD erase its memory of this instrument. The discussion here refines this requirement to assert that the Accumulator MUST output an Accumulation for any instrument/labelset that is used (i.e., "recorder") during an interval and that the Accumulator MUST NOT output an Accumulation for any instrument/labelset that is not used during an interval. The point of open-telemetry/opentelemetry-specification#1198 is that the Processor is the place that MAY be configured to keep indefinite state, for one of two reasons: (1) because DELTA->CUMULATIVE was requested, (2) because Memory was requested. Another way to consider this requirement: the SDK SHOULD be factored into Accumulator/Processor/Exporter such that an implementation of Prometheus export is achieved where both the Accumulator and the Exporter are stateless. |
Ah, ok. I misunderstood what was being proposed. Totally fine with reporting something is something is recorded, and not if not (for delta temporality in the Processor aggregator). |
@carlosalberto can we close this now? |
closing this due to inactivity, and having had a related PR merged. |
Currently, if a given counter has no reported values in a given interval (that is,
MetricData.getPoints().isEmpty == true
), it will be sent anyway (was OTLP'sIntSum
/DoubleSum
without any data point at all).Is this needed at all? Maybe we can simply ignore them in such a case? Or is there any reason why we keep on sending them, even if empty?
The text was updated successfully, but these errors were encountered: